Concepts

Detailed explanations of Kubernetes system concepts and abstractions.

Edit This Page

Device Plugins

FEATURE STATE: Kubernetes v1.8 alpha

This feature is currently in a alpha state, meaning:

  • The version names contain alpha (e.g. v1alpha1).
  • Might be buggy. Enabling the feature may expose bugs. Disabled by default.
  • Support for feature may be dropped at any time without notice.
  • The API may change in incompatible ways in a later software release without notice.
  • Recommended for use only in short-lived testing clusters, due to increased risk of bugs and lack of long-term support.

Starting in version 1.8, Kubernetes provides a device plugin framework for vendors to advertise their resources to the kubelet without changing Kubernetes core code. Instead of writing custom Kubernetes code, vendors can implement a device plugin that can be deployed manually or as a DaemonSet. The targeted devices include GPUs, High-performance NICs, FPGAs, InfiniBand, and other similar computing resources that may require vendor specific initialization and setup.

Device plugin registration

The device plugins feature is gated by the DevicePlugins feature gate and is disabled by default. When the device plugins feature is enabled, the kubelet exports a Registration gRPC service:

service Registration {
	rpc Register(RegisterRequest) returns (Empty) {}
}

A device plugin can register itself with the kubelet through this gRPC service. During the registration, the device plugin needs to send:

Following a successful registration, the device plugin sends the kubelet the list of devices it manages, and the kubelet is then in charge of advertising those resources to the API server as part of the kubelet node status update. For example, after a device plugin registers vendor-domain/foo with the kubelet and reports two healthy devices on a node, the node status is updated to advertise 2 vendor-domain/foo.

Then, developers can request devices in a Container specification by using the same process that is used for opaque integer resources. In version 1.8, extended resources are spported only as integer resources and must have limit equal to request in the Container specification.

Device plugin implementation

The general workflow of a device plugin includes the following steps:

A device plugin is expected to detect kubelet restarts and re-register itself with the new kubelet instance. In version 1.8, a new kubelet instance cleans up all the existing Unix sockets under /var/lib/kubelet/device-plugins when it starts. A device plugin can monitor the deletion of its Unix socket and re-register itself upon such an event.

Device plugin deployment

A device plugin can be deployed manually or as a DaemonSet. Being deployed as a DaemonSet has the benefit that Kubernetes can restart the device plugin if it fails. Otherwise, an extra mechanism is needed to recover from device plugin failures. The canonical directory /var/lib/kubelet/device-plugins requires privileged access, so a device plugin must run in a privileged security context. If a device plugin is running as a DaemonSet, /var/lib/kubelet/device-plugins must be mounted as a Volume in the plugin’s PodSpec.

Examples

For an example device plugin implementation, see nvidia GPU device plugin for COS base OS.

Analytics

Create an Issue Edit this Page