Kubernetes v1.26 [stable]
Kubernetes includes stable support for managing AMD and NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.
This page describes how users can consume GPUs, and outlines some of the limitations in the implementation.
Using device plugins
Kubernetes implements device plugins to let Pods access specialized hardware features such as GPUs.
As an administrator, you have to install GPU drivers from the corresponding hardware vendor on the nodes and run the corresponding device plugin from the GPU vendor. Here are some links to vendors' instructions:
Once you have installed the plugin, your cluster exposes a custom schedulable resource such as
You can consume these GPUs from your containers by requesting
the custom GPU resource, the same way you request
However, there are some limitations in how you specify the resource
requirements for custom devices.
GPUs are only supposed to be specified in the
limits section, which means:
- You can specify GPU
requests, because Kubernetes will use the limit as the request value by default.
- You can specify GPU in both
requestsbut these two values must be equal.
- You cannot specify GPU
Here's an example manifest for a Pod that requests a GPU:
apiVersion: v1 kind: Pod metadata: name: example-vector-add spec: restartPolicy: OnFailure containers: - name: example-vector-add image: "registry.example/example-vector-add:v42" resources: limits: gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
Clusters containing different types of GPUs
If different nodes in your cluster have different types of GPUs, then you can use Node Labels and Node Selectors to schedule pods to appropriate nodes.
# Label your nodes with the accelerator type they have. kubectl label nodes node1 accelerator=example-gpu-x100 kubectl label nodes node2 accelerator=other-gpu-k915
That label key
accelerator is just an example; you can use
a different label key if you prefer.
Automatic node labelling
At the moment, that controller can add labels for:
- Device ID (-device-id)
- VRAM Size (-vram)
- Number of SIMD (-simd-count)
- Number of Compute Unit (-cu-count)
- Firmware and Feature Versions (-firmware)
- GPU Family, in two letters acronym (-family)
- SI - Southern Islands
- CI - Sea Islands
- KV - Kaveri
- VI - Volcanic Islands
- CZ - Carrizo
- AI - Arctic Islands
- RV - Raven
kubectl describe node cluster-node-23
Name: cluster-node-23 Roles: <none> Labels: beta.amd.com/gpu.cu-count.64=1 beta.amd.com/gpu.device-id.6860=1 beta.amd.com/gpu.family.AI=1 beta.amd.com/gpu.simd-count.256=1 beta.amd.com/gpu.vram.16G=1 kubernetes.io/arch=amd64 kubernetes.io/os=linux kubernetes.io/hostname=cluster-node-23 Annotations: node.alpha.kubernetes.io/ttl: 0 …
With the Node Labeller in use, you can specify the GPU type in the Pod spec:
apiVersion: v1 kind: Pod metadata: name: cuda-vector-add spec: restartPolicy: OnFailure containers: - name: cuda-vector-add # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile image: "registry.k8s.io/cuda-vector-add:v0.1" resources: limits: nvidia.com/gpu: 1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: – matchExpressions: – key: beta.amd.com/gpu.family.AI # Arctic Islands GPU family operator: Exist
This ensures that the Pod will be scheduled to a node that has the GPU type you specified.
Items on this page refer to third party products or projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for those third-party products or projects. See the CNCF website guidelines for more details.
You should read the content guide before proposing a change that adds an extra third-party link.