Why GPUs Matter: From Basics to Virtualization in Modern Computing
This article explains what a GPU (Graphics Processing Unit) is, why it differs from a CPU, how it is used through graphics and compute APIs, and explores GPU virtualization techniques such as virtual GPUs, passthrough, and vGPU architectures.
GPU Overview
The term GPU stands for Graphic Processing Unit, first introduced by NVIDIA in 1999. Compared with the CPU, a GPU is designed specifically for handling graphics workloads, acting as the "heart" of a graphics card. It enables 3D hardware acceleration, while 2D graphics rely on software acceleration. Major manufacturers include NVIDIA and ATI (now AMD).
Why a Dedicated GPU Is Needed
GPUs employ a parallel programming model that differs fundamentally from the CPU's serial model. Their architecture resembles a shared‑memory multiprocessor, allowing many parallel execution units and high memory bandwidth. This makes GPUs far more efficient for tasks with massive data parallelism, such as repetitive calculations and frequent memory accesses.
How to Use a GPU
There are two primary ways to leverage a GPU:
Through general‑purpose graphics libraries such as OpenGL or DirectX , where developers write shaders (vertex, fragment) to control rendering.
Via GPU‑specific compute APIs like CUDA (NVIDIA) or OpenCL (industry standard), which allow general‑purpose computation without graphics APIs.
OpenGL, originally created by SGI, provides a cross‑platform interface for 2D and 3D graphics. DirectX, developed by Microsoft, serves a similar role on Windows platforms. Both expose shader languages that run on the GPU.
GPU Programming Interfaces
NVIDIA released CUDA (Compute Unified Device Architecture) in 2007, offering a C‑like language for writing parallel kernels. A CUDA program runs on a host CPU (the "host") and one or more GPU devices (the "devices"). Typical use cases include oil exploration, fluid dynamics, molecular dynamics, bio‑computing, video encoding/decoding, and astronomical simulations. AMD previously offered CTM and later the ATI Stream SDK, but has since adopted the open OpenCL standard.
GPU Virtualization
In virtualized environments, three main approaches exist for providing graphics acceleration:
Virtual graphics cards (e.g., VNC, Xen framebuffer, VMware virtual GPU, VMGL).
GPU passthrough (direct assignment of a physical GPU to a single VM).
GPU virtualization (splitting a physical GPU into multiple virtual GPUs, or vGPU).
Virtual Graphics Cards
Virtual graphics solutions such as VNC transmit the entire desktop over the network, while Xen framebuffer provides a virtual display device backed by a VNC server. VMGL implements front‑end virtualization by replacing the local OpenGL library with a fake library that forwards calls to a remote server hosting the real GPU and drivers.
GPU Passthrough
Passthrough (also called Pass‑Through) assigns a physical GPU exclusively to a VM, preserving near‑native performance. Technologies include Xen's VGA Passthrough (leveraging Intel VT‑d) and VMware's VM Direct Path I/O. Limitations involve loss of live migration, snapshot, and other advanced VM features because the GPU is directly accessed by the guest OS.
GPU Virtualization (vGPU)
GPU virtualization slices a physical GPU into time‑shared partitions that multiple VMs can use simultaneously. The implementation relies on API remoting: intercepting CUDA (or OpenGL) calls in the guest, forwarding them to a host service, and returning results.
Citrix’s vCUDA is a typical example. Its architecture consists of three components:
Client driver : installed inside the guest VM, it intercepts CUDA API calls, packages them, requests GPU resources from the manager, and maintains a virtual GPU (vGPU) state.
Server daemon : runs in a privileged VM, receives the packaged calls, validates them, executes them on the physical GPU, encodes the results, and updates the vGPU state.
Manager : also in the privileged domain, it schedules GPU resources, performs load balancing, dynamic allocation, and fault recovery across multiple VMs.
The manager follows principles such as allocating idle GPU resources to waiting VMs, balancing load when pressure is high, and migrating tasks to healthy GPUs in case of failures.
Overall, understanding GPU fundamentals, programming models, and virtualization techniques is essential for building high‑performance graphics and compute workloads in both bare‑metal and virtualized cloud environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
