Fundamentals 10 min read

Understanding NVIDIA GRID vGPU Virtualization and Its Allocation Modes

This article explains NVIDIA GRID vGPU virtualization, detailing how GPUs are partitioned by memory size, the supported hypervisors, the operation of virtual GPU resources, differences between full‑allocation vGPU and GPU pass‑through, licensing requirements, and performance considerations for cloud and data‑center environments.

Architects' Tech Alliance

Aug 9, 2017

Understanding NVIDIA GRID vGPU Virtualization and Its Allocation Modes

GPU (Graphics Processing Unit) excels at floating‑point and parallel computation, offering performance hundreds of times higher than CPUs. By applying GPU virtualization, multiple virtual machines in a data‑center can share one or more physical GPUs. Major virtualization vendors such as VMware and Microsoft have made progress in this area, and NVIDIA provides the GRID solution for GPU virtualization.

What is NVIDIA GRID? GRID is a software suite that enables GPU resource virtualization on server virtualization platforms. It slices a physical GPU into multiple virtual GPUs (vGPUs), each of which can be assigned to a virtual machine, granting the VM GPU capabilities for graphics or compute workloads.

After installing the NVIDIA GRID vGPU Manager on the host, the manager works with the virtualization platform to expose virtual GPU resources. The virtual machine then installs the appropriate GRID vGPU driver (either the standard NVIDIA Graphics Driver or the professional NVIDIA Quadro driver) to access the GPU.

The latest GRID version 4.3 supports three mainstream hypervisors: VMware vSphere, Citrix XenServer, and Huawei UVP. Platforms based on KVM or Microsoft Hyper‑V are not currently supported.

How does GRID slice GPUs? The slicing follows two principles: (1) each vGPU is allocated a fixed amount of video memory, and (2) a single GPU chip can only be sliced using one memory‑size configuration. For example, a K2 GPU with two chips, each having 4 GB of memory, can be divided into four 1 GB vGPUs (K240Q). Once a GPU is sliced with a particular size, the remaining memory cannot be used for a different vGPU size.

The same rules apply to the M series GPUs (M6, M10, M60), which support up to a 1:16 slicing ratio (the K series supports up to 1:8). Thus an M‑series GPU chip can be divided into up to sixteen 512 MB vGPUs.

Each vGPU contains resources analogous to a physical GPU: compute cores (supporting graphics and compute), video memory (framebuffer), video encode/decode engines, and a copy engine. While the video memory is dedicated to a vGPU, the compute cores are shared among vGPUs, allowing a vGPU to fully utilize the core at a given moment, which improves user experience.

Compared with a traditional workstation where CPU, storage, network, GPU, memory, and video memory are all exclusively owned, the virtualized model only isolates memory and video memory; other resources are shared, increasing utilization and reducing hardware costs.

vGPU Full‑Allocation Mode vs. GPU Pass‑Through In full‑allocation mode, a physical GPU is divided into multiple vGPUs (e.g., 1:4 or 1:8). An extreme 1:1 mode allocates the entire GPU to a single VM, resembling GPU pass‑through but with additional management layers. GPU pass‑through maps the physical PCI device directly to a VM using Intel VT‑D, and has been supported by most hypervisors for years.

The vGPU solution adds the NVIDIA vGPU Manager component between the GPU driver and the hypervisor, enabling multi‑tenant sharing and advanced features such as motion with vGPU. Pass‑through, by contrast, binds the GPU tightly to a VM, limiting higher‑level virtualization features.

Licensing: Using GRID drivers (whether for M6, M10, or M60) requires a GRID license, even when the GPU is passed through, because the driver itself is part of the GRID stack.

Officially, both full‑allocation vGPU and pass‑through provide similar user experience in most scenarios. However, in performance‑testing cases, pass‑through can achieve higher frame rates because it bypasses the vGPU Manager. The vGPU solution employs a Frame Rate Limiter (FRL) that caps each user at 60 fps to ensure fair sharing; this limit can be disabled for testing but is not recommended in production.

Because vGPU abstracts the hardware through a virtualization layer, it enables advanced virtualized features that are difficult to achieve with pure pass‑through, such as motion processing and other future capabilities.

Note: For more content, follow the “ICT_Architect” public account by scanning the QR code below.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Computing NVIDIA GPU virtualization vGPU Grid

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.