This document describes volume populators and data sources in Kubernetes. Familiarity with persistent volumes is suggested.
When you create a PersistentVolumeClaim, the volume that Kubernetes provisions for it normally starts empty. A data source lets you instead request that the new volume be pre-populated with existing data. Volume populators are the controllers that carry out that population, based on the data source that the PersistentVolumeClaim references.
Kubernetes has built-in support for data sources that clone an existing volume or that restore a volume snapshot. Custom volume populators extend this mechanism. The data source is a custom resource, that is, an object whose type is defined by a CustomResourceDefinition. A populator controller watches for PersistentVolumeClaims that reference such a resource and fills the new volume from it.
Kubernetes v1.24 [beta]Kubernetes supports custom volume populators.
To use custom volume populators, you must enable the AnyVolumeDataSource
feature gate for
the kube-apiserver and kube-controller-manager.
Volume populators take advantage of a PVC spec field called dataSourceRef. Unlike the
dataSource field, which can only contain either a reference to another PersistentVolumeClaim
or to a VolumeSnapshot, the dataSourceRef field can contain a reference to any object in the
same namespace, except for core objects other than PVCs. For clusters that have the feature
gate enabled, use of the dataSourceRef is preferred over dataSource.
The dataSourceRef field behaves almost the same as the dataSource field. If one is
specified while the other is not, the API server will give both fields the same value. Neither
field can be changed after creation, and attempting to specify different values for the two
fields will result in a validation error. Therefore the two fields will always have the same
contents.
There are two differences between the dataSourceRef field and the dataSource field that
users should be aware of:
dataSource field ignores invalid values (as if the field was blank) while the
dataSourceRef field never ignores values and will cause an error if an invalid value is
used. Invalid values are any core object (objects with no apiGroup) except for PVCs.dataSourceRef field may contain different types of objects, while the dataSource field
only allows PVCs and VolumeSnapshots.When the CrossNamespaceVolumeDataSource feature is enabled, there are additional differences:
dataSource field only allows local objects, while the dataSourceRef field allows
objects in any namespaces.dataSource and dataSourceRef are not synced.Users should always use dataSourceRef on clusters that have the feature gate enabled, and
fall back to dataSource on clusters that do not. It is not necessary to look at both fields
under any circumstance. The duplicated values with slightly different semantics exist only for
backwards compatibility. In particular, a mixture of older and newer controllers are able to
interoperate because the fields are the same.
Volume populators are controllers that can
create non-empty volumes, where the contents of the volume are determined by a Custom Resource.
Users create a populated volume by referring to a Custom Resource using the dataSourceRef field:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: populated-pvc
spec:
dataSourceRef:
name: example-name
kind: ExampleDataSource
apiGroup: example.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Because volume populators are external components, attempts to create a PVC that uses one can fail if not all the correct components are installed. External controllers should generate events on the PVC to provide feedback on the status of the creation, including warnings if the PVC cannot be created due to some missing component.
You can install the alpha volume data source validator controller into your cluster. That controller generates warning Events on a PVC in the case that no populator is registered to handle that kind of data source. When a suitable populator is installed for a PVC, it's the responsibility of that populator controller to report Events that relate to volume creation and issues during the process.
Kubernetes v1.26 [alpha]Kubernetes supports cross namespace volume data sources.
To use cross namespace volume data sources, you must enable the AnyVolumeDataSource
and CrossNamespaceVolumeDataSource
feature gates for
the kube-apiserver and kube-controller-manager.
Also, you must enable the CrossNamespaceVolumeDataSource feature gate for the csi-provisioner.
Enabling the CrossNamespaceVolumeDataSource feature gate allows you to specify
a namespace in the dataSourceRef field.
gateway.networking.k8s.io extension APIs.
See ReferenceGrant
in the Gateway API documentation for details.
This means that you must extend your Kubernetes cluster with at least ReferenceGrant from the
Gateway API before you can use this mechanism.Kubernetes v1.26 [alpha]Create a ReferenceGrant to allow the namespace owner to accept the reference.
You define a populated volume by specifying a cross namespace volume data source
using the dataSourceRef field. You must already have a valid ReferenceGrant
in the source namespace:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-ns1-pvc
namespace: default
spec:
from:
- group: ""
kind: PersistentVolumeClaim
namespace: ns1
to:
- group: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-demo
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo-pvc
namespace: ns1
spec:
storageClassName: example
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
dataSourceRef:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-demo
namespace: default
volumeMode: Filesystem