Inside Google’s Massive TPU SuperPod: How Scale‑Up and Scale‑Out Build a 9,216‑Chip AI Engine

The article explains Google’s TPU data‑center architecture, detailing the vertical Scale‑Up strategy within a SuperPod, the horizontal Scale‑Out across SuperPods, the 3D Torus topology with Twisted variants, and the multi‑layer network design that enables petabyte‑scale AI training and inference.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Inside Google’s Massive TPU SuperPod: How Scale‑Up and Scale‑Out Build a 9,216‑Chip AI Engine

Google TPU SuperPod Architecture Overview

Google’s TPU clusters use a two‑layer scaling strategy: Scale‑Up (vertical expansion inside a cabinet or SuperPod) and Scale‑Out (horizontal expansion across SuperPods or data‑center sites). The fabric combines a high‑bandwidth, low‑latency 3D Torus (with optional Twisted variant) interconnect and a Data Center Network (DCN) + Optical Circuit Switch (OCS) for elastic, non‑blocking connectivity.

Scale‑Up (Vertical Expansion)

Within a single computing domain, each TPU chip provides six high‑speed I/O ports. Each port delivers 800 Gbps, giving a bidirectional chip‑to‑chip bandwidth of:

800 Gbps × 2 × 6 ÷ 8 = 4.8 TB/s per 64‑chip cabinet

The 3D Torus topology connects every chip to six neighbours (±X, ±Y, ±Z) with wrap‑around links, resulting in a small network diameter, high bisection bandwidth, and simple routing. Twisted 3D Torus adds long‑distance jumpers for additional bandwidth.

POD Building Blocks

Each POD consists of 144 cabinets , each housing 64 TPU chips (16 trays × 4 chips per tray) → 9,216 chips per POD (the ICI POD).

Per cabinet I/O port distribution:

64 ports internal to the cabinet (384 ports total for the POD).

96 ports via optical modules + OCS for inter‑cabinet links.

128 ports on PCB internal routing.

160 copper/AOC ports for external rack connections.

Network Hierarchy

Access Layer

64‑chip cabinets connect to compute trays (4 chips per tray). Each tray uses a CDFP PCIe port to link to a CPU tray; the CPU NIC then attaches to Top‑of‑Rack (TOR) switches.

Aggregation Layer

Each ICI POD aggregates traffic through 288 TOR switches (12.8 T each). 50 % of TOR ports connect downstream to cabinets, and 50 % connect upstream to leaf switches.

Core Layer

Four ICI PODs form a core aggregation module. Each module contains 4 × 25.6 T Spine/Leaf switches (4 spines and 4 leaves). These provide high‑capacity intra‑POD routing.

Cluster Interconnect Layer

Four aggregation modules (AB) interconnect the full cluster of 147,456 TPU chips via OCS. The OCS fabric uses 64 × 300 × 300 ports , linking 2,304 Spine switches (each with 288 input and 288 output ports) to achieve a full‑mesh, low‑latency global network.

Key Metrics

Total chips per ICI POD: 9,216.

I/O ports per chip: 6 × 800 Gbps.

Intra‑POD bandwidth: up to 4.8 TB/s.

OCS interconnect: 64 × 300 × 300 ports, supporting dynamic reconfiguration across the super‑scale cluster.

3D Torus topology diagram
3D Torus topology diagram
POD composition diagram
POD composition diagram
Network ArchitectureData Centerscale-outScale‑UpAI hardwareTPUSuperPod
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.