NVIDIA vGPU vs AMD MxGPU: Architecture, Scheduling, and Virtualization Trade‑offs
This article explains GPU virtualization, comparing NVIDIA's software‑based vGPU and AMD's hardware‑based MxGPU, detailing their architecture, required hardware, licensing, performance indicators, resource scheduling strategies, slicing limits, and the advantages and drawbacks of each approach for virtualized workloads.
GPU Virtualization Concepts
GPU virtualization enables a single physical GPU to be divided into multiple virtual GPUs (vGPUs) that can be assigned to separate virtual machines (VMs). This allows shared graphics and compute resources while preserving isolation.
NVIDIA vGPU – Software Partitioning
Solution Components
Hardware : Requires a GPU model that supports NVIDIA’s virtualization extensions.
Software : A virtualization layer runs on the hypervisor host; each VM installs a special vGPU driver that communicates with the host layer to obtain a virtual slice of the GPU.
Licensing : vGPU usage is governed by license keys. Licenses can restrict the number of vGPUs, the amount of VRAM, or enable full‑feature profiles.
Performance Indicators
GPU Architecture : Newer architectures (e.g., Ampere, Hopper) provide higher transistor counts, larger caches, and newer process nodes, which improve overall throughput.
CUDA Cores : The total number of CUDA cores determines raw compute capacity; more cores generally yield higher performance.
VRAM : Each vGPU is allocated a fixed portion of the physical GPU memory, which is exclusive to the VM.
vGPU Resource Scheduling
Key Resources : CUDA cores, VRAM, Base Address Registers (BAR), and memory channels.
Exclusive VM Resources : VRAM, virtual BAR, and channel assignments are dedicated per VM.
Shared Resources : Compute cores are time‑shared; the hypervisor schedules execution slices across VMs.
Scheduling Policies :
Best‑effort (preemptive) – VMs with heavier workloads receive larger time slices.
Equal‑share – Every powered‑on VM receives a baseline allocation regardless of load.
Fixed‑share – Resources are reserved for each VM irrespective of activity.
vGPU Slicing Details
Each physical GPU runs a single profile that defines VRAM size and licensing mode. Profiles cannot be mixed on the same card (e.g., an 8‑core GPU can be split as 2‑2‑2‑2 or 4‑4, but not 2‑2‑4).
Live migration (VM drift) is only possible between VMs that use the same GPU model and the same profile; many GPUs do not support migration at all.
GPU Passthrough
Some GPUs support direct passthrough (PCI‑e assignment). In this mode the VM gains exclusive control of the physical GPU, bypassing the virtualization layer and achieving near‑native graphics and compute performance. Passthrough is typically used for workloads that require maximum GPU throughput, such as CAD, gaming, or deep‑learning training.
AMD MxGPU – Hardware Partitioning (SR‑IOV)
Overview
AMD MxGPU implements virtualization at the hardware level using Single Root I/O Virtualization (SR‑IOV). The GPU fabric creates multiple virtual functions (VFs) that appear as independent PCIe devices, each with its own dedicated VRAM and compute units.
Key Features
Hardware Resource Slicing : Each VF (vGPU) owns a fixed amount of VRAM and a subset of compute units, providing strong performance isolation.
vGPU Profiles : Different profiles map to distinct performance levels (e.g., low‑end, mid‑range, high‑end). Administrators select a profile per VM to match workload requirements.
Dynamic Allocation : Profiles can be re‑configured at runtime, allowing administrators to adjust GPU resources without rebooting the host.
GPU Sharing : Unused slices can be opportunistically shared among VMs, improving overall utilization.
Solution Composition
Hardware : A GPU that implements AMD’s SR‑IOV extensions.
Software : Minimal driver stack; the hardware performs the slicing, so no additional virtualization software is required.
Licensing : No separate software licenses are needed because the partitioning is performed by the hardware.
Resource Scheduling Principles
Each physical function (PF) can expose multiple virtual functions (VFs). Every VF is presented to the host as an independent PCIe device.
Exclusive Resources : VRAM is dedicated per VF with its own PCI configuration space.
Shared Resources : Compute pipelines may be time‑shared across VFs, though the exact sharing model depends on the GPU generation and workload.
Benefits of Hardware Virtualization
Lower CPU overhead – the hypervisor does not need to translate GPU commands.
More consistent and stable performance across VMs because the hardware enforces isolation.
Improved security via IOMMU isolation; each VF can only access memory assigned to its VM.
Limitations
Each GPU can be split only into an even number of VFs (e.g., a single card may expose two VFs). Only one profile can be configured per card, limiting flexibility.
All GPUs in a server share a single global configuration; mixed‑profile deployments across multiple cards are not supported.
Hardware vs. Software Virtualization Comparison
Hardware (SR‑IOV) : Lower cost, minimal CPU overhead, stable performance, strong security, but limited post‑release iteration and stricter slicing constraints.
Software (NVIDIA vGPU) : Upgradable via driver updates, broader ecosystem (CUDA), more flexible profile selection, but incurs higher CPU overhead and can be less stable under heavy load.
Representative Images
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
