Fundamentals 13 min read

Fujitsu A64FX: ARM‑Based SoC Powering Japan’s Fugaku Supercomputer and Its Architectural Heritage from BlueGene/L

The article provides a detailed technical overview of Fujitsu's ARM‑based A64FX processor, its design origins, specifications, network architecture, and how it enables the Fugaku supercomputer to achieve world‑leading performance while illustrating broader trends in high‑performance computing hardware.

Architects' Tech Alliance

Dec 19, 2022

Fujitsu A64FX: ARM‑Based SoC Powering Japan’s Fugaku Supercomputer and Its Architectural Heritage from BlueGene/L

Fujitsu’s A64FX is an ARM‑v8.2‑A based system‑on‑chip (SoC) originally unveiled at IEEE Hot Chips 2018 and now powers Japan’s Fugaku supercomputer, which has topped the TOP500 and Green500 lists.

The chip is manufactured in TSMC’s 7 nm FinFET process, integrates four 8 GB HBM2 memory stacks via a 2.5 D CoWoS package, and contains 48 compute cores plus four assistant I/O cores, organized into four Core Memory Groups (CMGs) each with 13 cores, 8 MB L2 cache and 8 MB HBM2.

Its SIMD engine uses ARM’s Scalable Vector Extension (SVE), delivering up to 2.7 TFLOPS per chip and supporting FP16, INT16/8 data types useful for AI workloads; the architecture also inherits ECC and parity protection from mainframe‑class SPARC64fx designs.

Compared with IBM’s BlueGene/L, A64FX follows the same “big picture, small steps” philosophy: a highly modular design with multiple network fabrics (3D Torus, Collective, Global Barriers, Gigabit Ethernet, Control Network) that enable low‑power, high‑density compute nodes and rapid system assembly.

Each compute card houses two A64FX dies, 512 MB DDR, and consumes only ~20 W; 16 cards form a node board (32 chips, 64 cores) delivering 180 GFLOPS, and 32 node boards in a rack provide 5.7 TFLOPS and 256 GB memory. A full rack can host 384 chips (18 432 cores) exceeding 1 PFLOPS.

Fugaku’s configuration of 152 064 A64FX chips across 396 racks achieves a measured performance of 415 PFLOPS, illustrating how the combination of commodity‑grade ARM ecosystem, mature TSMC process, and open‑source resources can replace custom ASIC development.

The article also notes the broader industry trend of moving from proprietary designs toward open‑spec ARM solutions, and speculates on Intel’s future competitiveness as process nodes shrink.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HPC ARM architecture supercomputer Fujitsu A64FX BlueGene/L HBM2 SVE

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.