Artificial Intelligence 13 min read

Google’s TPU v7: How 1.5 & 2.6 Optical Modules per Chip Power AI Supercomputers

The article explains how Google’s TPU v7 supercomputer uses a simple yet powerful networking scheme—1.5 optical modules per TPU for intra‑rack communication and an additional 2.6 modules per TPU for inter‑rack high‑speed links—enabling massive AI model training with balanced cost and performance.

Architects' Tech Alliance

Dec 28, 2025

Google’s TPU v7: How 1.5 & 2.6 Optical Modules per Chip Power AI Supercomputers

Network bottleneck in large AI supercomputers

Performance of a supercomputer is limited more by the efficiency of its interconnect network than by the raw number of AI chips. Without sufficient bandwidth and low latency between chips, even thousands of TPU processors cannot be fully utilized.

Baseline intra‑rack connectivity (1.5 optical modules per TPU)

Google defines a rack as the minimal physical unit, containing 64 TPU chips. To build a 3‑D torus network inside a rack, 96 optical modules are required. This yields a fixed ratio: 96 optical modules ÷ 64 TPUs = 1.5 optical modules per TPU The 1.5‑module ratio is mandatory for every rack, regardless of the total cluster size, and guarantees that each TPU has the necessary intra‑rack bandwidth.

Scale‑up inter‑rack connectivity (additional 2.6 modules per TPU)

When the total number of TPUs exceeds 9 216, intra‑rack links alone cannot satisfy cross‑rack traffic. Google therefore adds a three‑layer data‑center network (DCN):

ToR (Top‑of‑Rack) switches – entry/exit points for each rack.

Leaf switches – aggregation layer connecting multiple ToRs.

Spine OCS (Optical Circuit Switch) – high‑capacity backbone linking leaf switches.

Using a non‑blocking architecture, Google estimates that each TPU needs an extra 2.6 optical modules to attach to the DCN. The total per‑TPU module count becomes:

1.5 (intra‑rack) + 2.6 (inter‑rack) = 4.1 optical modules per TPU

Key technical details of the DCN include:

Circulator technology that enables bidirectional transmission on a single fiber, effectively turning a single‑lane link into a duplex lane.

800 G OSFP (Optical Small Form‑Factor Pluggable) modules, providing industry‑leading throughput.

Module count calculations for typical deployments

Inference‑only workload (1 024 TPUs)

Only intra‑rack communication is required:

1 024 TPUs × 1.5 modules/TPU = 1 536 optical modules

Training workload (36 864 TPUs)

Both intra‑rack and inter‑rack links are needed:

Intra‑rack: 36 864 × 1.5 = 55 296 modules

Inter‑rack: 36 864 × 2.6 ≈ 95 846 modules

Total ≈ 151 000 modules (≈4.1 modules per TPU)

Maximum‑scale cluster (147 456 TPUs)

Full DCN deployment for a 150 k‑scale system:

Intra‑rack: 147 456 × 1.5 = 221 184 modules

Inter‑rack: 147 456 × 2.6 ≈ 383 385 modules

Total ≈ 604 500 modules (≈4.1 modules per TPU)

Design philosophy

Google reduces the complex networking problem to two core numbers – 1.5 for the basic “community road” (ICI) and 2.6 for the optional “highway” (DCN). This modular “basic package + upgrade package” approach lets users start with a minimal cost configuration and add the high‑bandwidth backbone only when the workload demands it, achieving a balanced trade‑off among cost, latency, and bandwidth.

Conclusion

In the era of trillion‑parameter models, the efficiency of the interconnect architecture is the decisive factor for overall system performance. Google’s 1.5 + 2.6 optical‑module scheme demonstrates that a standardized, low‑latency, high‑throughput network can unlock the full compute potential of a 150 k‑TPU v7 cluster while keeping hardware costs under control.

Network Architecture Google Large-Scale Training TPU optical modules AI supercomputing

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Network bottleneck in large AI supercomputers

Baseline intra‑rack connectivity (1.5 optical modules per TPU)

Scale‑up inter‑rack connectivity (additional 2.6 modules per TPU)

Module count calculations for typical deployments

Inference‑only workload (1 024 TPUs)

Training workload (36 864 TPUs)

Maximum‑scale cluster (147 456 TPUs)

Design philosophy

Conclusion

Architects' Tech Alliance

How this landed with the community

Was this worth your time?

0 Comments

Inference‑only workload (1 024 TPUs)

Training workload (36 864 TPUs)

Maximum‑scale cluster (147 456 TPUs)