Fundamentals 10 min read

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

Architects' Tech Alliance

Dec 11, 2024

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

GPU Server Topology

In large‑scale model training, high‑performance GPU servers typically consist of a single chassis housing eight GPUs such as the A100, A800, H100, or H800, with future models like the L40S expected. The internal GPU compute hardware topology forms a full‑mesh network of GPUs.

PCIe Switch Chip

PCIe (Peripheral Component Interconnect Express) is the primary bus linking CPUs, memory modules, NVMe storage, GPUs, and network adapters. The latest Gen5 specification provides significantly higher inter‑device throughput, making PCIe a pivotal component in modern high‑performance computing clusters.

NVLink Overview

Definition

NVLink is NVIDIA’s proprietary high‑speed interconnect and communication protocol introduced in March 2014. It uses a point‑to‑point serial topology that can connect a CPU to a GPU or link multiple GPUs directly, offering multiple links per device and a mesh‑style network rather than a central hub.

Evolution (NVLink 1.0 – 4.0)

NVLink 1.0 : 4 channels, up to 160 GB/s bidirectional bandwidth.

NVLink 2.0 : 6 channels, up to 300 GB/s bidirectional bandwidth.

NVLink 3.0 : 12 channels, up to 600 GB/s bidirectional bandwidth.

NVLink 4.0 : 18 channels, up to 900 GB/s bidirectional bandwidth.

NVSwitch

NVSwitch is NVIDIA’s dedicated switch chip for intra‑node communication among multiple GPUs. In an 8‑GPU A100 configuration the NVSwitch sits beneath the large heat sinks, providing low‑latency, high‑throughput full‑mesh connectivity.

NVLink Switch

The term “NVLink switch” originally referred to on‑board switching logic within a GPU module. In 2022 NVIDIA released an independent NVLink switch product, distinct from NVSwitch, to enable high‑performance GPU communication across separate hosts.

HBM (High‑Bandwidth Memory)

Traditional GPU memory uses DDR chips accessed via PCIe, limiting bandwidth to 64 GB/s (Gen4) or 128 GB/s (Gen5). HBM stacks multiple DDR dies directly on the GPU die, eliminating the PCIe bottleneck and increasing data‑transfer rates by orders of magnitude, as demonstrated in NVIDIA’s H100 architecture.

HBM Evolution

Bandwidth Unit Analysis

When evaluating GPU‑centric systems, several bandwidth metrics must be considered: PCIe, memory, NVLink, HBM, and network links. Network speeds are expressed in bits per second (bit/s) with separate TX/RX values, while PCIe, memory, NVLink, and HBM use bytes per second (Byte/s) or transactions per second (T/s), representing combined bidirectional capacity. Accurate conversion and comparison of these units are essential for understanding data‑transfer limits that impact large‑scale GPU training performance.

Source: https://community.fs.com/cn/article/unveiling-the-foundations-of-gpu-computing1.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High-performance computing GPU computing NVLink HBM PCIe NVSwitch

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

GPU Server Topology

PCIe Switch Chip

NVLink Overview

Definition

Evolution (NVLink 1.0 – 4.0)

NVSwitch

NVLink Switch

HBM (High‑Bandwidth Memory)

HBM Evolution

Bandwidth Unit Analysis

Architects' Tech Alliance

How this landed with the community

Was this worth your time?

0 Comments

Evolution (NVLink 1.0 – 4.0)