Cloud Native 15 min read

Understanding Nvidia MIG: Concepts, Configuration, and Kubernetes Deployment

This article explains Nvidia's Multi‑Instance GPU (MIG) technology, compares it with vGPU, walks through enabling and partitioning MIG on A100 cards using nvidia‑smi commands, and shows how to expose MIG resources in Kubernetes with single and mixed strategies.

Infra Learning Club
Infra Learning Club
Infra Learning Club
Understanding Nvidia MIG: Concepts, Configuration, and Kubernetes Deployment

Concept of MIG

MIG (Multi‑Instance GPU) partitions a physical GPU into up to seven isolated instances. Each instance has dedicated compute engines, memory, L2 cache and other resources, providing stronger isolation than software virtualization and enabling guaranteed QoS.

On an NVIDIA A100 40 GB card the administrator can create any combination of instances, for example two 20 GB instances, three 10 GB instances, seven 5 GB instances, or a mixed set.

MIG vs vGPU

vGPU

vGPU is Nvidia's software solution that creates many small virtual GPUs on a single physical GPU and shares them among users in virtual machines.

MIG

MIG works by physically slicing resources (system lanes, control bus, TPC, global memory, L2 cache, data bus, etc.) and recombining them into independent sub‑GPUs (GPU Instances, GI). The process consists of two steps:

Partition (slice) : Divide compute engines and memory into uniform blocks; e.g., an A100 40 GB GPU can be split into 7 compute slices and 8 memory slices.

Combine : Pair a compute slice with a memory slice to form a GI such as 1g.5gb. Different profiles define the allowed combinations.

The resulting GI instances are independent and support most A100 features.

MIG partition diagram
MIG partition diagram

MIG on A100 – Testing and Commands

By default MIG is disabled. Enable it with: # nvidia-smi -i 0 -mig 1 List available GI profiles: # nvidia-smi mig -i 0 -lgip Create a 1g.5gb GI instance (profile ID 19):

# nvidia-smi mig -i 0 -cgi 19
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.5gb (ID 19)

After creating a GI the remaining free slots change (e.g., the 7g.40gb profile becomes 0 free). Further partitioning can be done by creating Compute Instances (CI) inside a GI:

# nvidia-smi mig -i 0 -lcip -gi 1
# nvidia-smi mig -i 0 -cci 2,2 -gi 1   # creates two CI of 2c each

Verify created devices:

# nvidia-smi -L
nvidia-smi -L output
nvidia-smi -L output

Using MIG in Kubernetes

Nvidia's device plugin supports two exposure strategies:

single : All GPUs on a node expose the same MIG type; the node must have identical GPU models and MIG enabled.

mixed : Nodes can expose a mixture of MIG types; each GPU may have a different MIG configuration.

Single strategy installation :

# helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
# helm repo update
# helm install --generate-name --set migStrategy=single --set allowDefaultNamespace=true nvdp/nvidia-device-plugin

After installation the node reports six identical MIG devices:

Capacity:
  nvidia.com/gpu:          6
Allocatable:
  nvidia.com/gpu:          6

Example pod requesting two MIG GPUs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-deploy
  labels:
    app: gpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: gpu
        image: chrstnhntschl/gpu_burn
        args:
        - "6000"
        resources:
          limits:
            nvidia.com/gpu: 2

Mixed strategy installation (request specific MIG types such as nvidia.com/mig-4g.20gb in the pod spec):

# helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
# helm repo update
# helm install --generate-name --set migStrategy=mixed --set allowDefaultNamespace=true nvdp/nvidia-device-plugin

Node capacity after mixed installation (example):

Capacity:
  nvidia.com/mig-3g.20gb: 1
  nvidia.com/mig-4g.20gb: 2
Allocatable:
  nvidia.com/mig-3g.20gb: 1
  nvidia.com/mig-4g.20gb: 2

Example pod consuming a 4g.20gb MIG instance:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-deploy
  labels:
    app: gpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: gpu
        image: chrstnhntschl/gpu_burn
        args:
        - "6000"
        resources:
          limits:
            nvidia.com/mig-4g.20gb: 1

MIG Command Reference

GI – List instances : nvidia-smi mig -lgi GI – Delete instance : nvidia-smi mig -dgi -gi <Instance ID> GI – Show profiles : nvidia-smi mig -lgip GI – Create instance : nvidia-smi mig -cgi <profile ID> CI – List profiles (optional GI) : nvidia-smi mig -lcip [-gi <GI ID>] CI – List created instances : nvidia-smi mig -lci CI – Create instance : nvidia-smi mig -cci <profile ID> -gi <GI ID> CI – Delete instance : nvidia-smi mig -dci -ci <CI ID> GI+CI – Create both together :

nvidia-smi mig -i 0 -cgi <gi profile ID> -C <ci profile ID>
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesNvidiaA100GPU virtualizationDevice PluginMIGnvidia-smi
Infra Learning Club
Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.