Cloud Native 18 min read

How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins

This article explains why managing GPUs with Kubernetes improves cost efficiency and deployment speed, details how to containerize GPU workloads, build appropriate images, configure NVIDIA drivers, and use Kubernetes Device Plugins and Extend Resources to schedule and monitor GPU resources, while also discussing current limitations and community solutions.

Alibaba Cloud Native

Jan 13, 2020

How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins

GPU Containerization

To run a GPU workload in a container you need to:

Build a container image that contains the required CUDA libraries and the machine‑learning framework (e.g., TensorFlow, PyTorch). Use an official NVIDIA CUDA base image and add only the additional packages you need.

Run the image with Docker (or NVIDIA‑Docker) and bind‑mount the host’s /dev device files and NVIDIA driver libraries into the container.

The host must have the NVIDIA driver installed. The driver stays on the host, while the CUDA toolkit and application binaries are packaged inside the container. At runtime the driver’s shared libraries are bind‑mounted, allowing different CUDA versions to coexist on the same node.

Running a GPU Container with Docker

docker run --gpus all \
    -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
    -v /dev:/dev \
    my-gpu-image:latest

The --gpus all flag (or the NVIDIA‑Docker runtime) ensures the GPU devices and driver files are visible inside the container.

Kubernetes GPU Management

Kubernetes schedules GPUs through two complementary mechanisms:

Extended Resources : Users can define a custom integer‑valued resource such as nvidia.com/gpu. The scheduler treats the resource as a count and can allocate it to pods.

Device Plugin Framework : A third‑party plugin runs on each node, reports the health and quantity of GPUs to the kubelet, and handles allocation requests.

Reporting Extended Resources Manually

If a device plugin is not used, the node’s status can be patched directly:

curl -X PATCH \
  -H "Content-Type: application/strategic-merge-patch+json" \
  --data '{"status":{"capacity":{"example.com/gpu":"1"}}}' \
  https://KUBE_APISERVER/api/v1/nodes/NODE_NAME/status

When a Device Plugin is installed this step is performed automatically.

Device Plugin Lifecycle

Registration : The plugin registers its name, socket path, and API version with the kubelet.

Service Start : It starts a gRPC server to serve requests.

ListAndWatch : The kubelet opens a long‑running stream to receive device IDs and health status.

Allocate : When a pod requests a GPU, the kubelet calls Allocate; the plugin returns the device paths, driver directories, and any required environment variables.

Pod Scheduling with GPUs

A pod requests a GPU by adding a limit:

resources:
  limits:
    nvidia.com/gpu: 1

The scheduler selects a node with enough reported GPUs, decrements the node’s capacity, and binds the pod. During container creation the kubelet contacts the appropriate Device Plugin, receives the device IDs, and mounts the corresponding device files and driver directories into the container.

Deploying GPU Support on a Kubernetes Node (CentOS example)

Install the NVIDIA driver (requires gcc and kernel headers).

Install the NVIDIA Docker runtime (package nvidia-docker2) and restart Docker. Verify the runtime with docker info (look for Runtimes: nvidia).

Deploy the NVIDIA Device Plugin as a DaemonSet:

git clone https://github.com/NVIDIA/k8s-device-plugin.git
kubectl apply -f k8s-device-plugin/nvidia-device-plugin.yml

The DaemonSet runs the plugin on every GPU node, registers the resource nvidia.com/gpu, and starts the gRPC server.

Verification

After the DaemonSet is ready, inspect the node:

kubectl get node <em>NODE_NAME</em> -o jsonpath='{.status.capacity.nvidia\.com/gpu}'

The output should be the number of GPUs (e.g., 2).

Sample Pod Manifest

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: tf
    image: nvcr.io/nvidia/tensorflow:22.09-tf2-py3
    resources:
      limits:
        nvidia.com/gpu: 1
    command: ["/bin/bash", "-c", "nvidia-smi && python -c 'import tensorflow as tf; print(tf.__version__)'"]

Deploy with kubectl apply -f gpu-pod.yaml. Inside the container nvidia-smi should list the allocated GPU (e.g., a T4), confirming that the device is isolated and visible.

Limitations of the Built‑in Device Plugin Model

The scheduler only tracks the number of GPUs, not their specific capabilities (e.g., NVLink connectivity, memory size). Complex placement requirements such as “two GPUs linked by NVLink” cannot be expressed. The Device Plugin API also lacks extensibility for custom parameters in Allocate or ListAndWatch, making heterogeneous or affinity‑aware scheduling difficult.

Community Extensions for Heterogeneous Scheduling

NVIDIA’s custom GPU‑aware scheduler (fork of upstream scheduler).

Alibaba Cloud’s GPU‑sharing scheduler for multi‑tenant environments.

Vendor‑specific plugins for RDMA, FPGA, and AMD GPUs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes containerization GPU Device Plugin

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.