Cloud Native 13 min read

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

This article dissects the GPUMounter project's implementation of dynamic GPU device mounting to a pod, detailing the roles of cgroups (v1 and v2) and Linux namespaces, and provides step‑by‑step command‑line examples and a CLI tool for practical use.

Infra Learning Club

Jan 4, 2025

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

Cgroup version detection

# v1
mount | grep cgroup | grep cgroup
# v2
mount | grep cgroup2

Cgroups v1 – devices subsystem

The devices subsystem provides three control files: devices.allow – list of permitted devices. devices.deny – list of denied devices. devices.list – report of current device access.

Each entry in devices.allow has four fields: type, major, minor, and access. Example values: type: a (all), b (block), c (character). major and minor are the device numbers, e.g. 195 and 0 for /dev/nvidia0. access: r (read), w (write), m (mknod).

Obtain the numbers for /dev/nvidia0:

ls -l /dev/nvidia0
crw-rw-rw- 1 root root 195, 0 Dec 24 14:32 /dev/nvidia0

Grant a pod access to the GPU by writing an entry to the pod’s cgroup devices.allow file:

# Path format (replace placeholders with actual values)
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod{UID}.slice/cri-containerd-{ContainerID}.scope/devices.allow

echo 'c 195:0 rw' > /sys/fs/cgroup/.../devices.allow

Cgroups v2 – eBPF device controller

In cgroup v2 the device controller is implemented with eBPF filters. Install bpftool to inspect and load these filters: apt update && apt install bpftool -y Check whether a device filter is attached to the pod’s cgroup:

bpftool cgroup tree /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod{UID}.slice/cri-containerd-{ContainerID}.scope

Example output shows an entry with AttachType "device": 14962 device multi cgroup_device_prog Show details of the loaded BPF program:

bpftool prog show id 14962
14962: cgroup_device  tag c394c0e22708d632  loaded_at 2025-01-04T11:29:58+0800  uid 0  xlated 3840B  jited 2172B  memlock 4096B

Namespace handling

After the cgroup grants GPU access, the device must be made visible inside the pod’s mount namespace.

Obtain the pod UID, container ID and PID:

# Get pod UID
kubectl get pod gpu-pod -o jsonpath='{.metadata.uid}'
# Get container ID
kubectl get pod gpu-pod -o jsonpath='{.status.containerStatuses[0].containerID}'
# Get PID via containerd
ctr -n k8s.io task ls | grep {ContainerID}

Enter the pod’s mount namespace: nsenter --target 1145716 --mount sh Create the character device node for the GPU: mknod /dev/nvidia0 c 195 0 Adjust permissions so the container can use the device:

chmod 666 /dev/nvidia0

Demo – Cgroups v1

Pod manifest used for the demonstration (requires a functional Kubernetes cluster with NVIDIA components installed):

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  nodeSelector:
    gpu-mounter-enable: enable
  containers:
  - name: cuda-container
    image: docker.samzong.me/chrstnhntschl/gpu_burn
    resources:
      limits:
        nvidia.com/gpu: '1'

Steps:

Create a pod that does not request a GPU (set NVIDIA_VISIBLE_DEVICES: "none") to verify the environment.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  nodeSelector:
    gpu-mounter-enable: enable
  containers:
  - name: cuda-container
    image: docker.samzong.me/chrstnhntschl/gpu_burn
    command: ["sleep"]
    args: ["100000000"]
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: "none"

After the pod starts, retrieve UID, ContainerID and PID:

$ kubectl get pod gpu-pod -o jsonpath='{.metadata.uid}'
ace81d74-99ca-4b34-b60e-a60ec1442875

$ kubectl get pod gpu-pod -o jsonpath='{.status.containerStatuses[0].containerID}'
containerd://4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04

$ ctr -n k8s.io task ls | grep 4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04
4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04   1145716   RUNNING

Write the device allowance to the cgroup path (the path is composed of the pod UID and container ID):

# Example cgroup path
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podace81d74_99ca_4b34_b60e_a60ec1442875.slice/cri-containerd-4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04.scope

# Grant access to /dev/nvidia0
echo 'c 195:0 rw' > /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podace81d74_99ca_4b34_b60e_a60ec1442875.slice/cri-containerd-4b5ef4c625208800283bb58e9da3ccb69ce92bd840f3cf885931c513a901ab04.scope/devices.allow

Enter the mount namespace of the container (PID obtained earlier) and create the device node:

$ nsenter --target 1145716 --mount sh
# Inside the namespace
mknod /dev/nvidia0 c 195 0
chmod 666 /dev/nvidia0

CLI tool (nvcli)

The nvcli binary automates the above steps. It provides two sub‑commands: mount – dynamically mount one or more GPU indices to a pod. unmount – dynamically unmount GPU indices from a pod.

Common flags: --kubeconfig (default /root/.kube/config) – path to the kubeconfig file. --name – target pod name. --namespace (default default) – pod namespace. --v – log level for klog (0‑10, default 2).

Mount example (mount GPU index 0 to gpu-pod ): ./nvcli mount --name=gpu-pod --mount=0 Unmount example (remove GPU index 0 from gpu-pod ): ./nvcli unmount --name=gpu-pod --unmount=0 Both commands accept a comma‑separated list of indices for batch operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes GPU cgroups namespace dynamic mounting nvcli

Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.