Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit
This guide walks through checking system and software environments, installing and configuring the NVIDIA Docker plugin, verifying GPU access in Docker containers, deploying the NVIDIA device plugin on a Kubernetes cluster, creating GPU‑enabled pods, and troubleshooting common issues, all with concrete commands and configuration examples.
First, verify the host OS and Kubernetes versions. Example commands show how to display the Ubuntu release ( # lsb_release -a) and the current kubectl and server versions ( # kubectl version), noting any version skew warnings.
Install the NVIDIA Docker plugin
On a node with GPU resources, add the NVIDIA repository and import its GPG key:
# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listEnable the experimental repository line:
# sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.listUpdate the package index and install the toolkit:
# sudo apt-get update
# sudo apt-get install -y nvidia-container-toolkitConfigure Docker to use the NVIDIA runtime
Run the configuration command, which updates /etc/docker/daemon.json to add an nvidia runtime entry:
# sudo nvidia-ctk runtime configure --runtime=dockerThe resulting daemon.json contains sections such as
"runtimes": {"nvidia": {"path": "nvidia-container-runtime", "args": []}}. Restart Docker to apply the changes:
# systemctl daemon-reload
# systemctl restart dockerValidate Docker GPU access
Run a test container that executes nvidia-smi:
# docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smiThe output lists GPU model, driver version, memory usage, and confirms that the container can see the NVIDIA hardware.
Deploy the NVIDIA device plugin in Kubernetes
Create the plugin DaemonSet using the official manifest:
# kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.1/deployments/static/nvidia-device-plugin.ymlThe YAML defines a DaemonSet in the kube-system namespace with appropriate tolerations and a system-node-critical priority class. After deployment, check the pod logs to ensure the plugin started without errors. If the node lacks GPU resources or the Docker runtime is mis‑configured, the logs will contain messages such as “Incompatible strategy detected auto” and hints to verify the NVIDIA Container Toolkit installation.
Create a GPU‑enabled pod
Define a pod manifest that requests one GPU:
apiVersion: v1
kind: Pod
metadata:
name: ffmpeg-pod
spec:
nodeName: aiserver003087 # optional, specify a GPU node
containers:
- name: ffmpeg-container
image: nightseas/ffmpeg:latest
command: ["/bin/bash", "-c", "tail -f /dev/null"]
resources:
limits:
nvidia.com/gpu: 1Apply the manifest ( # kubectl apply -f gpu_test.yaml), copy a test video into the pod, and run an ffmpeg command that uses CUDA acceleration:
# ffmpeg -hwaccel cuvid -c:v h264_cuvid -i test.mp4 -vf scale_npp=1280:720 -vcodec h264_nvenc out.mp4Successful conversion and the presence of out.mp4 confirm that the pod can use the GPU.
Label nodes and adjust DaemonSet for selective deployment
Label GPU nodes so that the DaemonSet only runs on them: # kubectl label nodes aiserver003087 gpu=true Update the DaemonSet (or pod) manifest to include a nodeSelector matching gpu: "true". Note that the selector value must be quoted, otherwise kubectl apply will reject the manifest.
Common pitfalls
If a node has no GPU, the plugin will report “No devices found”.
If the Docker runtime is not set to nvidia, containers will fail to access the GPU and the plugin logs will suggest checking the NVIDIA Container Toolkit configuration.
Ensure the daemon.json contains the correct "default-runtime": "nvidia" and "runtimes" sections.
Following these steps provides a reproducible workflow for enabling GPU acceleration in Docker containers and Kubernetes workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
