Why Google’s Split 8th‑Gen TPU Could Out‑Earn General‑Purpose GPUs
Google’s Cloud Next 2026 reveal splits the 8th‑generation TPU into training‑focused Sunfish and inference‑focused Zebrafish, highlighting Ironwood’s record‑breaking performance, a multi‑vendor supply chain, Anthropic’s multi‑gigawatt order, and a broader industry shift toward custom AI chips that promise far higher profit margins than generic GPUs.
Google announced at Cloud Next 2026 that its 8th‑generation TPU family will be divided into two dedicated accelerators: Sunfish for training and Zebrafish for inference.
Ironwood, the 7th‑gen TPU, already ships with 4.6 petaFLOPS FP8 peak, 192 GB HBM3e, 7.37 TB/s bandwidth, and a 9,216‑chip cluster delivering 42.5 exaFLOPS—about 24× the current world‑leading El Capitan supercomputer. Its per‑watt performance is roughly 2 × Trillium and 2.8 × Nvidia H100.
In a direct comparison, Ironwood and Nvidia’s Blackwell B200 share similar FP8 compute (≈4.6 petaFLOPS) and HBM capacity, but Blackwell offers higher NVLink bandwidth (14.4 Tbps vs Ironwood’s ICI 9.6 Tbps) and supports FP4, which can double throughput for quantized models—an advantage Ironwood lacks.
Sunfish (TPU 8t) integrates two compute dies, an I/O die and eight stacks of 12‑layer HBM3e, giving about 30 % higher memory bandwidth than Ironwood’s 8‑layer stack. Zebrafish (TPU 8i) uses a single compute die, a single I/O die and six HBM3e stacks, targeting 20‑30 % lower cost than the training chip while maintaining high inference performance.
Both chips will be fabricated on TSMC’s 2 nm process and are slated for release in the second half of 2027.
The split reflects a strategic move away from “one‑size‑fits‑all” chips: training demands extreme compute density and bandwidth for trillion‑parameter models, whereas inference prioritises cost efficiency and low latency for billions of daily queries.
Google’s supply‑chain strategy involves Broadcom designing Ironwood and Sunfish, MediaTek handling Zebrafish, and potential third‑party partners such as Marvell for a memory‑processing unit and an additional inference TPU, giving Google bargaining power, redundancy, and the ability to assign workloads to the most suitable partner.
Anthropic has placed a 3.5 GW order for Ironwood, initially purchasing 400 k units (≈$100 billion valuation) and later expanding to 1 million TPU chips by 2027, underscoring the economic appeal of specialised inference hardware.
Industry‑wide, cloud providers are accelerating custom‑chip programs: Amazon’s Trainium, Microsoft’s Maia 200, Meta’s MTIA, and Nvidia’s NVLink Fusion. Forecasts predict custom ASICs will capture 45 % of the AI‑chip market by 2028, while Nvidia’s inference share could fall from >90 % to 20‑30 %.
The overall narrative is that large cloud vendors are betting on profit‑margin advantages of dedicated inference silicon rather than competing on every performance metric against Nvidia GPUs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
