Industry Insights 16 min read

Inside Fujitsu A64FX: Architecture that Drives Japan’s Fugaku Supercomputer

The article examines Fujitsu’s A64FX ARM‑compatible processor, detailing its 7 nm FinFET design, HBM2 memory, SVE vector extensions, and system‑on‑chip integration, while comparing its engineering philosophy and performance to IBM’s BlueGene/L and explaining how these choices enabled the Fugaku supercomputer to top the Top500 list.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Inside Fujitsu A64FX: Architecture that Drives Japan’s Fugaku Supercomputer

Background and Significance

In recent years ARM‑compatible processors have attracted major attention, from Apple Silicon to Japan’s flagship supercomputer Fugaku, which is built around Fujitsu’s A64FX processor. The A64FX is widely regarded as the most advanced ARM‑compatible chip for high‑performance computing (HPC).

Technical Origins of A64FX

Fujitsu first disclosed A64FX details at IEEE HotChips 2018, describing a transition from SPARC‑v9 to ARM‑v8.2‑A (SVE) while retaining many SPARC64‑derived features such as mainframe‑grade data reliability. The chip is fabricated in TSMC’s 7 nm FinFET process and integrates four 8 GB HBM2 memory stacks via a 2.5 D CoWoS package, eliminating external memory modules.

Design Philosophy: From Supercomputers to Commodity

A64FX is a system‑on‑chip (SoC) specifically tuned for supercomputing. Its architecture follows the same “big picture, small steps” approach pioneered by IBM’s BlueGene/L, emphasizing high integration, low power, and rapid system assembly. By leveraging a mature 7 nm process, a rich ARM ecosystem, and high‑bandwidth HBM2, Fujitsu achieved a design that is both cost‑effective and high‑performance.

Comparison with IBM BlueGene/L

BlueGene/L, introduced in 2004, demonstrated that a highly integrated, low‑power node could dominate the Top500 rankings. Its key features included:

Ultra‑low power consumption (≈1/28 of NEC Earth Simulator for the same workload).

Simplified system architecture with minimal cabling.

Rapid product design by reusing existing R&D and integrating additional functions on a single chip.

BlueGene/L’s node comprised a 700 MHz dual‑core PowerPC 440, 4 MB L3 cache, and a 3D Torus network plus several auxiliary networks for control, collective operations, interrupts, Gigabit Ethernet I/O, and system management.

Key Architectural Features of A64FX

TSMC 7 nm FinFET process with 8.786 billion transistors.

Four 8 GB HBM2 stacks (total 32 GB) providing high bandwidth and compact form factor.

48 compute cores plus 4 I/O‑assist cores, organized into four Core Memory Groups (CMGs). Each CMG contains 13 cores (12 compute + 1 assist), 8 MB L2 cache, and 8 GB HBM2.

Scalable Vector Extension (SVE) for SIMD floating‑point operations, delivering up to 2.7 TFLOPS per chip (2.5× the predecessor SPARC64‑fx) and supporting FP16, INT16/8 for AI workloads.

Advanced MOVPRFX + FMA3 instruction pairing that effectively implements FMA4 functionality.

Full mainframe‑grade reliability: ECC on registers, parity checks, and extensive error‑correction mechanisms.

Inter‑chip bus based on the third‑generation “Tofu” 6‑D Mesh/Torus network, derived from the earlier K‑Computer.

Modular rack design: 384 A64FX chips per rack (18432 cores) yielding >1 PFLOPS theoretical performance per rack.

Performance in Fugaku

Fugaku consists of 396 racks (152 064 A64FX chips) and achieved 415 PFLOPS on the Top500 benchmark, securing the #1 spot in both Green500 (energy efficiency) and Top500 rankings.

Strategic Implications

While many server vendors have shifted toward open‑standard, commodity‑based designs, Fujitsu continues to develop high‑end custom processors, integrating mature process technology, ARM’s ecosystem, and proven SPARC‑derived reliability. This strategy balances the benefits of open resources with the performance advantages of a purpose‑built SoC.

Future Outlook

The article concludes by questioning whether Intel can regain competitiveness if TSMC maintains its process lead, noting Intel’s roadmap toward 7 nm, 5 nm, 3 nm, and beyond, and suggesting that the semiconductor landscape may undergo significant shifts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

High‑performance computingARM architecturesupercomputerFujitsu A64FXBlueGene/LSVEFugaku
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.