What’s New in Arm’s X925 and A725 CPUs? Deep Dive into 3nm Architecture
Arm’s 2024 release of the X925 and A725 cores brings a 2+4+2 configuration on a 3 nm process, featuring a doubled fetch buffer, larger ROB, higher clock speeds, expanded cache options, and incremental micro‑architectural tweaks that together boost performance and efficiency amid growing competition from Apple and Qualcomm.
1. Introduction
In May 2024 Arm released its fifth‑generation X‑series core, renamed X925, and the A‑series core A725, both built on the 3 nm process and using the Armv9.2 ISA. The X925 (code‑named Blackhawk) and A725 (code‑named Chaberton) are expected to appear first in MediaTek SoCs, while Qualcomm plans its own custom design.
2. Arm’s CSS (Compute Subsystem) Solution
Arm introduced the Compute Subsystem (CSS) at TCS 2023 to help foundries quickly adapt 3 nm and Armv9 technologies for Android and AI workloads. CSS integrates CPU and GPU, pushes clock speeds above 3.6 GHz (compared with 3.35 GHz on the 4 nm Dimensity 9200+), and enables PPA optimisation.
3. 3 nm Process Migration
The Android flagship market is moving to 3 nm, following Apple’s A17 success. While 3 nm promises higher performance, it also brings higher cost and longer ramp‑up time. Arm’s CSS is partly designed to mitigate these challenges.
4. Overall Arm Reference Design (2+4+2)
Arm’s 2024 reference design adopts a 2 X925 + 4 A725 + 2 A520 configuration, replacing the previous 1+3+4 layout. Arm claims a 36 % performance uplift for X925 at 3.6 GHz, a 35 % efficiency gain for A725, and a 20 % power saving for A520 on 3 nm.
5. X925 Micro‑architecture Analysis
5‑1 Front‑end
Pre‑decode fetch buffer doubled from 32 B to 64 B, increasing instruction availability.
Branch predictor improvements, including “fold‑out unconditional direct branches” to reduce stalls.
L1 instruction bandwidth increased from 32 B to 64 B; iTLB capacity doubled.
5‑2 Back‑end
Added one LD‑AGU unit (now 2 ST + 4 LD versus 2 ST + 3 LD in X4).
L1 data bandwidth doubled to 64 B; L2 cache grew from 2 MiB to 3 MiB.
Reorder Buffer (ROB) size doubled from 384 to 768 entries, surpassing Apple’s A17 ROB and boosting out‑of‑order execution by 25‑40 %.
5‑3 Execution Units
SIMD/FP units increased from 4 to 6 lanes.
Integer ALU now supports more complex two‑cycle operations.
Integer multiply units rose from 2 to 4; FP compare units from 1 to 2.
5‑4 Performance
Running at 3.8 GHz, X925 shows up to 36 % IPC improvement over its predecessor, with roughly 25 % coming from higher clock and ~11 % from micro‑architectural changes. Geekbench 6 data suggests about 15 % gain from architecture alone.
6. A725 and A520 Highlights
A725’s ROB size is larger than the previous 192 entries (exact size undisclosed).
L2 cache options expanded to 1 MiB, up from a maximum of 512 KB in A720.
Both cores benefit from 3 nm‑specific power‑efficiency optimisations; A520 sees ~15 % efficiency improvement despite unchanged architecture.
7. Conclusion
Arm’s 2024 X925 and A725 cores demonstrate incremental but meaningful gains on the 3 nm node, especially in clock speed, ROB size, and cache bandwidth. However, competition from Apple’s custom A‑series and Qualcomm’s Oryon will shape market outcomes.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
