How NVIDIA Builds AI Supercomputers: From H100 to GH200 and GB200 SuperPods
This article analyzes NVIDIA's evolving AI supercomputer architectures—detailing the H100‑based 256‑GPU SuperPod, the GH200‑based 256‑GPU SuperPod with integrated Grace CPU, and the GB200‑based 576‑GPU SuperPod—examining their NVLink and InfiniBand topologies, bandwidth limits, and scalability challenges.
Background
As AI models grow larger, training on a single GPU becomes infeasible, requiring hundreds or thousands of GPUs to work together as a unified system. NVIDIA's DGX SuperPod series addresses this need by providing data‑center‑grade AI infrastructure that supports training, inference, high‑performance computing (HPC), and mixed workloads.
H100‑Based 256‑GPU SuperPod
In the DGX A100 generation, each node contained eight GPUs interconnected by NVLink and NVSwitch, while inter‑node communication relied on a 200 Gbps InfiniBand HDR network (or RoCE as an alternative). With the DGX H100 generation, NVIDIA extended NVLink beyond intra‑node traffic, introducing an NVLink network switch that connects nodes directly. Inside a node, NVSwitch handles local GPU traffic; between nodes, the NVLink switch provides up to 256 H100 GPUs in a single cluster. The design maintains a per‑GPU bandwidth of 450 GB/s even when data is reduced across 256 GPUs, matching the bandwidth of a single server.
Despite this breakthrough, the H100 SuperPod still has a bottleneck: each DGX H100 node is linked to the rest of the cluster by only 72 NVLink connections, resulting in a non‑fully‑non‑blocking network. The total bidirectional bandwidth of these links is 3.6 TB/s, while the eight H100 GPUs together can provide 7.2 TB/s, indicating oversubscription at the NVSwitch layer.
GH200‑Based 256‑GPU SuperPod
In 2023 NVIDIA announced the DGX GH200, which pairs a Grace CPU with an H200 GPU (a higher‑memory, higher‑bandwidth variant of H100). GH200 uses NVLink 4.0 for GPU‑GPU connections and also for CPU‑GPU links, delivering 900 GB/s per link. Within a node, copper cables connect components, while inter‑node communication uses 800 Gbps optical modules, each providing two NVLink 4.0 links (100 GB/s each).
The GH200 SuperPod replaces the previous mixed NVLink/InfiniBand design with a fully NVLink‑based network. Each node contains eight GH200 GPUs and three NVLink switches (first‑tier). Scaling to 256 GPUs adds a second tier of 36 NVLink switches, creating a fat‑tree topology that eliminates the bandwidth bottleneck present in the H100 design.
GB200‑Based 576‑GPU SuperPod
The GB200 module integrates a Grace CPU and two Blackwell GPUs. A GB200 tray holds two such modules (four GPUs, two CPUs). A full GB200 SuperPod consists of 18 trays (36 CPUs, 72 GPUs) plus nine NVLink switches, each providing 144 NVLink ports. To interconnect 576 GPUs, a two‑level NVLink switch hierarchy is required: 144 switches at the first level and an additional 72 switches at the second level, ensuring full non‑blocking connectivity.
According to NVIDIA documentation, a 576‑GPU SuperPod is likely to use a Scale‑Out RDMA network for inter‑tray communication rather than a pure NVLink scale‑up approach, because the required number of NVSwitches would exceed physical rack limits.
Comparative Insights
All three generations aim to provide a unified, high‑bandwidth fabric for AI workloads, but they differ in how node‑to‑node traffic is handled.
H100 relies on a hybrid NVLink + InfiniBand design, which introduces a bandwidth bottleneck at the NVSwitch layer.
GH200 moves to a fully NVLink‑based fabric, eliminating the bottleneck and simplifying the topology.
GB200 scales to 576 GPUs using a two‑tier NVLink switch tree, but may still need an external RDMA network for full scalability.
Conclusion
The evolution from H100 to GH200 and GB200 demonstrates a shift from “spec stacking” to “systemic architecture innovation,” where NVIDIA integrates CPU‑GPU co‑design and high‑speed NVLink fabrics to meet exascale AI compute demands. Understanding these topologies helps architects design future AI supercomputers and evaluate trade‑offs between NVLink‑centric and RDMA‑centric networks.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
