How GPU Devices Are Dynamically Mounted to Kubernetes Pods
This article dissects the GPUMounter project's implementation of dynamic GPU device mounting to a pod, detailing the roles of cgroups (v1 and v2) and Linux namespaces, and provides step‑by‑step command‑line examples and a CLI tool for practical use.
Cgroup version detection
# v1
mount | grep cgroup | grep cgroup
# v2
mount | grep cgroup2Cgroups v1 – devices subsystem
The devices subsystem provides three control files: devices.allow – list of permitted devices. devices.deny – list of denied devices. devices.list – report of current device access.
Each entry in devices.allow has four fields: type, major, minor, and access. Example values: type: a (all), b (block), c (character). major and minor are the device numbers, e.g. 195 and 0 for /dev/nvidia0. access: r (read), w (write), m (mknod).
Obtain the numbers for /dev/nvidia0:
ls -l /dev/nvidia0
crw-rw-rw- 1 root root 195, 0 Dec 24 14:32 /dev/nvidia0Grant a pod access to the GPU by writing an entry to the pod’s cgroup devices.allow file:
# Path format (replace placeholders with actual values)
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod{UID}.slice/cri-containerd-{ContainerID}.scope/devices.allow
echo 'c 195:0 rw' > /sys/fs/cgroup/.../devices.allowCgroups v2 – eBPF device controller
In cgroup v2 the device controller is implemented with eBPF filters. Install bpftool to inspect and load these filters: apt update && apt install bpftool -y Check whether a device filter is attached to the pod’s cgroup:
bpftool cgroup tree /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod{UID}.slice/cri-containerd-{ContainerID}.scopeExample output shows an entry with AttachType "device": 14962 device multi cgroup_device_prog Show details of the loaded BPF program:
bpftool prog show id 14962
14962: cgroup_device tag c394c0e22708d632 loaded_at 2025-01-04T11:29:58+0800 uid 0 xlated 3840B jited 2172B memlock 4096BNamespace handling
After the cgroup grants GPU access, the device must be made visible inside the pod’s mount namespace.
Obtain the pod UID, container ID and PID:
# Get pod UID
kubectl get pod gpu-pod -o jsonpath='{.metadata.uid}'
# Get container ID
kubectl get pod gpu-pod -o jsonpath='{.status.containerStatuses[0].containerID}'
# Get PID via containerd
ctr -n k8s.io task ls | grep {ContainerID}Enter the pod’s mount namespace: nsenter --target 1145716 --mount sh Create the character device node for the GPU: mknod /dev/nvidia0 c 195 0 Adjust permissions so the container can use the device:
chmod 666 /dev/nvidia0Demo – Cgroups v1
Pod manifest used for the demonstration (requires a functional Kubernetes cluster with NVIDIA components installed):
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
nodeSelector:
gpu-mounter-enable: enable
containers:
- name: cuda-container
image: docker.samzong.me/chrstnhntschl/gpu_burn
resources:
limits:
nvidia.com/gpu: '1'Steps:
Create a pod that does not request a GPU (set NVIDIA_VISIBLE_DEVICES: "none") to verify the environment.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
nodeSelector:
gpu-mounter-enable: enable
containers:
- name: cuda-container
image: docker.samzong.me/chrstnhntschl/gpu_burn
command: ["sleep"]
args: ["100000000"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "none"After the pod starts, retrieve UID, ContainerID and PID:
$ kubectl get pod gpu-pod -o jsonpath='{.metadata.uid}'
ace81d74-99ca-4b34-b60e-a60ec1442875
$ kubectl get pod gpu-pod -o jsonpath='{.status.containerStatuses[0].containerID}'
containerd://4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04
$ ctr -n k8s.io task ls | grep 4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04
4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04 1145716 RUNNINGWrite the device allowance to the cgroup path (the path is composed of the pod UID and container ID):
# Example cgroup path
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podace81d74_99ca_4b34_b60e_a60ec1442875.slice/cri-containerd-4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04.scope
# Grant access to /dev/nvidia0
echo 'c 195:0 rw' > /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podace81d74_99ca_4b34_b60e_a60ec1442875.slice/cri-containerd-4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04.scope/devices.allowEnter the mount namespace of the container (PID obtained earlier) and create the device node:
$ nsenter --target 1145716 --mount sh
# Inside the namespace
mknod /dev/nvidia0 c 195 0
chmod 666 /dev/nvidia0CLI tool (nvcli)
The nvcli binary automates the above steps. It provides two sub‑commands: mount – dynamically mount one or more GPU indices to a pod. unmount – dynamically unmount GPU indices from a pod.
Common flags: --kubeconfig (default /root/.kube/config) – path to the kubeconfig file. --name – target pod name. --namespace (default default) – pod namespace. --v – log level for klog (0‑10, default 2).
Mount example (mount GPU index 0 to gpu-pod ): ./nvcli mount --name=gpu-pod --mount=0 Unmount example (remove GPU index 0 from gpu-pod ): ./nvcli unmount --name=gpu-pod --unmount=0 Both commands accept a comma‑separated list of indices for batch operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
