Inside Google’s Massive TPU SuperPod: How Scale‑Up and Scale‑Out Build a 9,216‑Chip AI Engine
The article explains Google’s TPU data‑center architecture, detailing the vertical Scale‑Up strategy within a SuperPod, the horizontal Scale‑Out across SuperPods, the 3D Torus topology with Twisted variants, and the multi‑layer network design that enables petabyte‑scale AI training and inference.
Google TPU SuperPod Architecture Overview
Google’s TPU clusters use a two‑layer scaling strategy: Scale‑Up (vertical expansion inside a cabinet or SuperPod) and Scale‑Out (horizontal expansion across SuperPods or data‑center sites). The fabric combines a high‑bandwidth, low‑latency 3D Torus (with optional Twisted variant) interconnect and a Data Center Network (DCN) + Optical Circuit Switch (OCS) for elastic, non‑blocking connectivity.
Scale‑Up (Vertical Expansion)
Within a single computing domain, each TPU chip provides six high‑speed I/O ports. Each port delivers 800 Gbps, giving a bidirectional chip‑to‑chip bandwidth of:
800 Gbps × 2 × 6 ÷ 8 = 4.8 TB/s per 64‑chip cabinetThe 3D Torus topology connects every chip to six neighbours (±X, ±Y, ±Z) with wrap‑around links, resulting in a small network diameter, high bisection bandwidth, and simple routing. Twisted 3D Torus adds long‑distance jumpers for additional bandwidth.
POD Building Blocks
Each POD consists of 144 cabinets , each housing 64 TPU chips (16 trays × 4 chips per tray) → 9,216 chips per POD (the ICI POD).
Per cabinet I/O port distribution:
64 ports internal to the cabinet (384 ports total for the POD).
96 ports via optical modules + OCS for inter‑cabinet links.
128 ports on PCB internal routing.
160 copper/AOC ports for external rack connections.
Network Hierarchy
Access Layer
64‑chip cabinets connect to compute trays (4 chips per tray). Each tray uses a CDFP PCIe port to link to a CPU tray; the CPU NIC then attaches to Top‑of‑Rack (TOR) switches.
Aggregation Layer
Each ICI POD aggregates traffic through 288 TOR switches (12.8 T each). 50 % of TOR ports connect downstream to cabinets, and 50 % connect upstream to leaf switches.
Core Layer
Four ICI PODs form a core aggregation module. Each module contains 4 × 25.6 T Spine/Leaf switches (4 spines and 4 leaves). These provide high‑capacity intra‑POD routing.
Cluster Interconnect Layer
Four aggregation modules (AB) interconnect the full cluster of 147,456 TPU chips via OCS. The OCS fabric uses 64 × 300 × 300 ports , linking 2,304 Spine switches (each with 288 input and 288 output ports) to achieve a full‑mesh, low‑latency global network.
Key Metrics
Total chips per ICI POD: 9,216.
I/O ports per chip: 6 × 800 Gbps.
Intra‑POD bandwidth: up to 4.8 TB/s.
OCS interconnect: 64 × 300 × 300 ports, supporting dynamic reconfiguration across the super‑scale cluster.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
