Google’s TPU v7: How 1.5 & 2.6 Optical Modules per Chip Power AI Supercomputers
The article explains how Google’s TPU v7 supercomputer uses a simple yet powerful networking scheme—1.5 optical modules per TPU for intra‑rack communication and an additional 2.6 modules per TPU for inter‑rack high‑speed links—enabling massive AI model training with balanced cost and performance.
Network bottleneck in large AI supercomputers
Performance of a supercomputer is limited more by the efficiency of its interconnect network than by the raw number of AI chips. Without sufficient bandwidth and low latency between chips, even thousands of TPU processors cannot be fully utilized.
Baseline intra‑rack connectivity (1.5 optical modules per TPU)
Google defines a rack as the minimal physical unit, containing 64 TPU chips. To build a 3‑D torus network inside a rack, 96 optical modules are required. This yields a fixed ratio: 96 optical modules ÷ 64 TPUs = 1.5 optical modules per TPU The 1.5‑module ratio is mandatory for every rack, regardless of the total cluster size, and guarantees that each TPU has the necessary intra‑rack bandwidth.
Scale‑up inter‑rack connectivity (additional 2.6 modules per TPU)
When the total number of TPUs exceeds 9 216, intra‑rack links alone cannot satisfy cross‑rack traffic. Google therefore adds a three‑layer data‑center network (DCN):
ToR (Top‑of‑Rack) switches – entry/exit points for each rack.
Leaf switches – aggregation layer connecting multiple ToRs.
Spine OCS (Optical Circuit Switch) – high‑capacity backbone linking leaf switches.
Using a non‑blocking architecture, Google estimates that each TPU needs an extra 2.6 optical modules to attach to the DCN. The total per‑TPU module count becomes:
1.5 (intra‑rack) + 2.6 (inter‑rack) = 4.1 optical modules per TPUKey technical details of the DCN include:
Circulator technology that enables bidirectional transmission on a single fiber, effectively turning a single‑lane link into a duplex lane.
800 G OSFP (Optical Small Form‑Factor Pluggable) modules, providing industry‑leading throughput.
Module count calculations for typical deployments
Inference‑only workload (1 024 TPUs)
Only intra‑rack communication is required:
1 024 TPUs × 1.5 modules/TPU = 1 536 optical modulesTraining workload (36 864 TPUs)
Both intra‑rack and inter‑rack links are needed:
Intra‑rack: 36 864 × 1.5 = 55 296 modules Inter‑rack: 36 864 × 2.6 ≈ 95 846 modules Total ≈ 151 000 modules (≈4.1 modules per TPU)Maximum‑scale cluster (147 456 TPUs)
Full DCN deployment for a 150 k‑scale system:
Intra‑rack: 147 456 × 1.5 = 221 184 modules Inter‑rack: 147 456 × 2.6 ≈ 383 385 modules Total ≈ 604 500 modules (≈4.1 modules per TPU)Design philosophy
Google reduces the complex networking problem to two core numbers – 1.5 for the basic “community road” (ICI) and 2.6 for the optional “highway” (DCN). This modular “basic package + upgrade package” approach lets users start with a minimal cost configuration and add the high‑bandwidth backbone only when the workload demands it, achieving a balanced trade‑off among cost, latency, and bandwidth.
Conclusion
In the era of trillion‑parameter models, the efficiency of the interconnect architecture is the decisive factor for overall system performance. Google’s 1.5 + 2.6 optical‑module scheme demonstrates that a standardized, low‑latency, high‑throughput network can unlock the full compute potential of a 150 k‑TPU v7 cluster while keeping hardware costs under control.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
