Understanding Nvidia MIG: Concepts, Configuration, and Kubernetes Deployment
This article explains Nvidia's Multi‑Instance GPU (MIG) technology, compares it with vGPU, walks through enabling and partitioning MIG on A100 cards using nvidia‑smi commands, and shows how to expose MIG resources in Kubernetes with single and mixed strategies.
Concept of MIG
MIG (Multi‑Instance GPU) partitions a physical GPU into up to seven isolated instances. Each instance has dedicated compute engines, memory, L2 cache and other resources, providing stronger isolation than software virtualization and enabling guaranteed QoS.
On an NVIDIA A100 40 GB card the administrator can create any combination of instances, for example two 20 GB instances, three 10 GB instances, seven 5 GB instances, or a mixed set.
MIG vs vGPU
vGPU
vGPU is Nvidia's software solution that creates many small virtual GPUs on a single physical GPU and shares them among users in virtual machines.
MIG
MIG works by physically slicing resources (system lanes, control bus, TPC, global memory, L2 cache, data bus, etc.) and recombining them into independent sub‑GPUs (GPU Instances, GI). The process consists of two steps:
Partition (slice) : Divide compute engines and memory into uniform blocks; e.g., an A100 40 GB GPU can be split into 7 compute slices and 8 memory slices.
Combine : Pair a compute slice with a memory slice to form a GI such as 1g.5gb. Different profiles define the allowed combinations.
The resulting GI instances are independent and support most A100 features.
MIG on A100 – Testing and Commands
By default MIG is disabled. Enable it with: # nvidia-smi -i 0 -mig 1 List available GI profiles: # nvidia-smi mig -i 0 -lgip Create a 1g.5gb GI instance (profile ID 19):
# nvidia-smi mig -i 0 -cgi 19
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.5gb (ID 19)After creating a GI the remaining free slots change (e.g., the 7g.40gb profile becomes 0 free). Further partitioning can be done by creating Compute Instances (CI) inside a GI:
# nvidia-smi mig -i 0 -lcip -gi 1
# nvidia-smi mig -i 0 -cci 2,2 -gi 1 # creates two CI of 2c eachVerify created devices:
# nvidia-smi -LUsing MIG in Kubernetes
Nvidia's device plugin supports two exposure strategies:
single : All GPUs on a node expose the same MIG type; the node must have identical GPU models and MIG enabled.
mixed : Nodes can expose a mixture of MIG types; each GPU may have a different MIG configuration.
Single strategy installation :
# helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
# helm repo update
# helm install --generate-name --set migStrategy=single --set allowDefaultNamespace=true nvdp/nvidia-device-pluginAfter installation the node reports six identical MIG devices:
Capacity:
nvidia.com/gpu: 6
Allocatable:
nvidia.com/gpu: 6Example pod requesting two MIG GPUs:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-deploy
labels:
app: gpu
spec:
replicas: 1
selector:
matchLabels:
app: gpu
template:
metadata:
labels:
app: gpu
spec:
containers:
- name: gpu
image: chrstnhntschl/gpu_burn
args:
- "6000"
resources:
limits:
nvidia.com/gpu: 2Mixed strategy installation (request specific MIG types such as nvidia.com/mig-4g.20gb in the pod spec):
# helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
# helm repo update
# helm install --generate-name --set migStrategy=mixed --set allowDefaultNamespace=true nvdp/nvidia-device-pluginNode capacity after mixed installation (example):
Capacity:
nvidia.com/mig-3g.20gb: 1
nvidia.com/mig-4g.20gb: 2
Allocatable:
nvidia.com/mig-3g.20gb: 1
nvidia.com/mig-4g.20gb: 2Example pod consuming a 4g.20gb MIG instance:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-deploy
labels:
app: gpu
spec:
replicas: 1
selector:
matchLabels:
app: gpu
template:
metadata:
labels:
app: gpu
spec:
containers:
- name: gpu
image: chrstnhntschl/gpu_burn
args:
- "6000"
resources:
limits:
nvidia.com/mig-4g.20gb: 1MIG Command Reference
GI – List instances : nvidia-smi mig -lgi GI – Delete instance : nvidia-smi mig -dgi -gi <Instance ID> GI – Show profiles : nvidia-smi mig -lgip GI – Create instance : nvidia-smi mig -cgi <profile ID> CI – List profiles (optional GI) : nvidia-smi mig -lcip [-gi <GI ID>] CI – List created instances : nvidia-smi mig -lci CI – Create instance : nvidia-smi mig -cci <profile ID> -gi <GI ID> CI – Delete instance : nvidia-smi mig -dci -ci <CI ID> GI+CI – Create both together :
nvidia-smi mig -i 0 -cgi <gi profile ID> -C <ci profile ID>Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
