How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases
This article explains the fundamentals of GPU architecture, the need for GPU virtualization, and walks through user‑level, kernel‑level, hardware‑level, and full GPU virtualization techniques, illustrating each layer with diagrams and code examples while highlighting practical deployment scenarios.
1. GPU and Software Architecture
GPUs are used for graphics rendering, high‑performance computing (GPGPU), and video codec acceleration. In a typical software stack, the GPU subsystem can be abstracted into two main domains: general‑purpose compute and graphics rendering.
2. GPU Virtualization
Virtualization creates an abstraction layer on top of hardware, allowing a single physical machine to host multiple virtual machines (VMs). GPU virtualization extends this concept to GPU resources, enabling VMs to share or be isolated from GPU hardware.
3. GPU Virtualization Requirements
Two primary requirements drive GPU virtualization:
Resource sharing – multiple tenants (containers or VMs) need concurrent access to powerful GPUs (e.g., multi‑screen automotive systems, remote desktops, cloud GPU services).
Resource isolation – tenants must not interfere with each other, requiring memory, compute, and fault isolation.
4. GPU Virtualization Techniques
Virtualization can be implemented at three layers, each targeting different scenarios:
User‑level: API interception and forwarding.
Kernel‑level: GPU driver interception or semi‑virtualization (para‑virtualization).
Hardware‑level: Full hardware virtualization (SR‑IOV, Nvidia MIG).
5. User‑Level Virtualization
1) Local API interception and forwarding
Implement a user‑space library (e.g., libwrapper) that mirrors all GPU driver APIs.
Applications link against libwrapper, which uses dlopen to load the real driver library. libwrapper intercepts calls, processes arguments, forwards them to the real driver, and returns results to the application.
2) Remote API forwarding libwrapper is split into a client and a server; the client forwards API calls over the network to a server hosting the real driver.
This enables GPU pooling, allowing machines without a physical GPU to execute GPU workloads remotely.
3) Semi‑virtualized API forwarding
Both the application and libwrapper run inside a VM.
Communication uses a virtio front‑end in the VM and a virtio back‑end in the hypervisor to invoke the host’s GPU driver.
Shared memory can accelerate data transfer between guest and host.
6. Kernel‑Level Virtualization
1) Device file interception
The real GPU driver is accessed via a device file, e.g., /dev/realgpu.
A kernel module creates a mock device file with the same name and bind‑mounts it into containers or VMs.
All accesses to /dev/realgpu are intercepted by the module, which forwards them to the real driver and returns results to user space.
2) Driver semi‑virtualization
In this model, the guest OS runs a virtual GPU driver that issues hypercalls to the hypervisor, which then proxies the request to the host’s real GPU driver.
7. Hardware‑Level Virtualization
Hardware support is required for full isolation and performance:
CPU and memory hardware virtualization.
IOMMU support for DMA and interrupt remapping.
Features such as SR‑IOV (Single Root I/O Virtualization) and Nvidia MIG (Multi‑Instance GPU) enable multiple virtual GPUs to share a single physical device.
8. Full GPU Pass‑Through (Full Virtualization)
Pass‑through assigns the entire physical GPU to a VM without any driver modifications. While this offers near‑native performance, it does not allow resource sharing and therefore is not considered true GPU virtualization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
