Industry Insights 14 min read

How Many Optical Modules Do AI GPU SuperPODs Really Need? A Detailed Calculation

This article analyzes the factors influencing optical‑module requirements for AI GPU clusters, compares four typical network configurations for A100 and H100 SuperPODs, and provides step‑by‑step calculations that reveal the projected market demand for 200G, 400G, and 800G modules in 2023‑2024.

Architects' Tech Alliance

Apr 20, 2024

How Many Optical Modules Do AI GPU SuperPODs Really Need? A Detailed Calculation

Background

Various methods exist to calculate the ratio of optical modules to GPUs, leading to inconsistent results. The main cause is the variation in the number of optical modules required by different network topologies. Accurate estimation depends on several key factors such as GPU count, NIC speed, switch model, and unit scalability.

Key Network Components

NIC models : ConnectX‑6 (200 Gb/s, primarily for A100) and ConnectX‑7 (400 Gb/s, primarily for H100). A next‑generation ConnectX‑8 (880 Gb/s) is expected in 2024.

Switch models : QM‑9700 (32 × OSFP 2×400 Gb/s, 64 channels, 51.2 Tb/s total throughput) and QM‑8700 (40 × QSFP56, 40 channels, 16 Tb/s total throughput).

Scalable Unit Count

The number of units determines the switch architecture. Small batches adopt a two‑tier structure, while large batches require a three‑tier structure.

H100 SuperPOD: each unit contains 32 nodes (DGX H100 servers), up to 4 units, using a two‑tier architecture.

A100 SuperPOD: each unit contains 20 nodes (DGX A100 servers), up to 7 units; if units exceed 5, a three‑tier architecture is needed.

Four Network Configurations and Optical‑Module Demand

A100 + ConnectX‑6 + QM‑8700 (three‑tier): ratio 1:6, all 200 G optical modules.

A100 + ConnectX‑6 + QM‑9700 (two‑tier): mix of 800 G (ratio 1:0.75) and 200 G (ratio 1:1) modules.

H100 + ConnectX‑7 + QM‑9700 (two‑tier): mix of 800 G (ratio 1:1.5) and 400 G (ratio 1:1) modules.

H100 + ConnectX‑8 (future) + QM‑9700 (three‑tier): all 800 G modules, maintaining the 1:65 GPU‑to‑module ratio.

Market Impact Estimates

Assuming 2023 shipments of 300 k H100 GPUs and 900 k A100 GPUs, the total demand would be 3.15 M × 200 G, 0.30 M × 400 G, and 7.875 M × 800 G modules, corresponding to a market size of roughly US$1.38 billion.

For 2024, with an estimated 1.5 M H100 and 1.5 M A100 units, the demand rises to 0.75 M × 200 G, 0.75 M × 400 G, and 6.75 M × 800 G modules, projecting a market of about US$4.97 billion—approximately the entire 2021 optical‑module industry.

Detailed Calculations per Scenario

Scenario 1: A100 + ConnectX‑6 + QM‑8700 (three‑tier)

A100 GPUs have eight compute interfaces (four on each side). Each node connects to eight leaf switches; 20 nodes form a scaling unit (SU). For each SU, the first tier requires 8 × SU leaf switches, 8 × SU × 20 cables, and 2 × 8 × SU × 20 × 200 G modules. The second tier mirrors the first tier’s cable count. Spine‑switch count is derived by dividing total cables by leaf‑switch count, with adjustments for port limits. When the system scales to seven units, a third tier becomes essential, keeping cable counts unchanged but adding spine switches.

For a 140‑server deployment (1120 A100 GPUs), the configuration requires 140 QM‑8700 switches, 3360 cables, and 6720 × 200 G optical modules, yielding a GPU‑to‑module ratio of 1:6.

Scenario 2: A100 + ConnectX‑6 + QM‑9700 (two‑tier)

This non‑recommended configuration replaces direct 200 G cables with QSFP‑to‑OSFP adapters, each handling two connections (1‑to‑4 mapping). For 140 servers (1120 GPUs) across seven units, the first tier needs 280 × 800 G and 1120 × 200 G modules, while the second tier uses 5600 × 800 G modules. Total hardware: 21 QM‑9700 switches, 840 × 800 G modules, and 1120 × 200 G modules, giving a GPU‑to‑800 G module ratio of 1:0.75.

Scenario 3: H100 + ConnectX‑7 + QM‑9700 (two‑tier)

Each H100 GPU includes eight 400 G NICs that combine into four 800 G interfaces, driving high 800 G module demand. In a SuperPOD with four units (128 servers, 1024 H100 GPUs), the first tier requires 512 × 800 G and 1024 × 400 G modules. The second tier adds 1024 × 800 G modules, resulting in a total of 1536 × 800 G and 1024 × 400 G modules. The GPU‑to‑800 G module ratio is 1:1.5, and the GPU‑to‑400 G ratio is 1:1.

Scenario 4: H100 + ConnectX‑8 (future) + QM‑9700 (three‑tier)

If H100 NICs upgrade to 800 G, each node would need eight OSFP ports, and all inter‑layer connections would use 800 G modules. The GPU‑to‑module ratio remains 1:65, identical to the baseline scenario.

Conclusion

Advances in network technology such as 400 G multimode modules, AOC, and DAC are rapidly shaping high‑speed solutions. The projected growth of optical‑module demand underscores the critical role of scalable, high‑bandwidth interconnects in supporting AI workloads of the digital era.

Network Architecture AI GPU hardware design optical modules SuperPod

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Key Network Components

Scalable Unit Count

Four Network Configurations and Optical‑Module Demand

Market Impact Estimates

Detailed Calculations per Scenario

Scenario 1: A100 + ConnectX‑6 + QM‑8700 (three‑tier)

Scenario 2: A100 + ConnectX‑6 + QM‑9700 (two‑tier)

Scenario 3: H100 + ConnectX‑7 + QM‑9700 (two‑tier)

Scenario 4: H100 + ConnectX‑8 (future) + QM‑9700 (three‑tier)

Conclusion

Architects' Tech Alliance

How this landed with the community

Was this worth your time?

0 Comments

Scenario 1: A100 + ConnectX‑6 + QM‑8700 (three‑tier)

Scenario 2: A100 + ConnectX‑6 + QM‑9700 (two‑tier)

Scenario 3: H100 + ConnectX‑7 + QM‑9700 (two‑tier)

Scenario 4: H100 + ConnectX‑8 (future) + QM‑9700 (three‑tier)