Deep Dive into Kubelet’s DeviceManager Source Code
This article explains how Kubernetes uses the device‑plugin framework to extend resources beyond CPU and memory, details the kubelet registration and allocation workflow, and walks through the relevant source code in pkg/kubelet/cm/devicemanager that builds the OCI spec.
device‑plugin extends Kubernetes to manage hardware beyond CPU and memory, such as Nvidia GPUs.
Because a device‑plugin does not see Pods or Containers, it cannot perform dynamic allocation based on user‑specified GPU models or UUIDs; the community is addressing these gaps with the Device Resource Allocation (DRA) project.
Kubelet invocation flow
Deploy the device‑plugin to target nodes with a DaemonSet.
Register the plugin with kubelet by connecting to its Unix socket and sending the socket path, API version, and resource name.
Kubelet calls the plugin’s ListAndWatch method over gRPC to obtain the current device inventory.
Kubelet updates the node status in the API server to reflect resource changes.
When a Pod is created and scheduled to the node, kubelet calls the plugin’s Allocate method to reserve the requested devices.
The device‑plugin source resides in pkg/kubelet/cm/devicemanager, which is part of the ContainerManager and constructs OCI spec content during container creation.
Registration and device‑discovery flow
Resource allocation flow
ContainerManager.GetResources
During ContainerManager.GetResources, the code enters the devicemanager module and returns a DeviceRunContainerOptions object:
type DeviceRunContainerOptions struct {
Envs []kubecontainer.EnvVar
Mounts []kubecontainer.Mount
Devices []kubecontainer.DeviceInfo
Annotations []kubecontainer.Annotation
CDIDevices []kubecontainer.CDIDevice
}If CDI is enabled, the CDIDevices field is populated and the container runtime merges the CDI file into the OCI spec.
Key plugin functions
ListAndWatch : Sends the initial device list once, then watches local devices; if a device becomes unhealthy, it resends the list of healthy devices.
Allocate : Invoked by kubelet after a Pod is scheduled; the plugin returns the allocation result for the container.
GetDevicePluginOption : Returns a DevicePluginOptions struct describing optional parameters for the Device Manager.
type DevicePluginOptions struct {
PreStartRequired bool // whether PreStartContainer must be called before each container starts
GetPreferredAllocationAvailable bool // whether the plugin provides a preferred allocation method
}PreStartContainer
If the plugin sets PreStartRequired=true during registration, kubelet calls PreStartContainer before each container starts, allowing the plugin to perform device‑specific preparation such as resetting the device.
GetPreferredAllocation
When PreStartRequired=true, the plugin can perform a second‑stage optimal allocation. The official nvidia-device-plugin implements this to choose the best GPU topology for multi‑GPU workloads by examining node‑level topology information.
Source: https://lengrongfu.github.io/blog/kubelet-devicemanage
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
