Industry Insights 12 min read

What’s New in Arm’s X925 and A725 CPUs? Deep Dive into 3nm Architecture

Arm’s 2024 release of the X925 and A725 cores brings a 2+4+2 configuration on a 3 nm process, featuring a doubled fetch buffer, larger ROB, higher clock speeds, expanded cache options, and incremental micro‑architectural tweaks that together boost performance and efficiency amid growing competition from Apple and Qualcomm.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
What’s New in Arm’s X925 and A725 CPUs? Deep Dive into 3nm Architecture

1. Introduction

In May 2024 Arm released its fifth‑generation X‑series core, renamed X925, and the A‑series core A725, both built on the 3 nm process and using the Armv9.2 ISA. The X925 (code‑named Blackhawk) and A725 (code‑named Chaberton) are expected to appear first in MediaTek SoCs, while Qualcomm plans its own custom design.

2. Arm’s CSS (Compute Subsystem) Solution

Arm introduced the Compute Subsystem (CSS) at TCS 2023 to help foundries quickly adapt 3 nm and Armv9 technologies for Android and AI workloads. CSS integrates CPU and GPU, pushes clock speeds above 3.6 GHz (compared with 3.35 GHz on the 4 nm Dimensity 9200+), and enables PPA optimisation.

3. 3 nm Process Migration

The Android flagship market is moving to 3 nm, following Apple’s A17 success. While 3 nm promises higher performance, it also brings higher cost and longer ramp‑up time. Arm’s CSS is partly designed to mitigate these challenges.

4. Overall Arm Reference Design (2+4+2)

Arm’s 2024 reference design adopts a 2 X925 + 4 A725 + 2 A520 configuration, replacing the previous 1+3+4 layout. Arm claims a 36 % performance uplift for X925 at 3.6 GHz, a 35 % efficiency gain for A725, and a 20 % power saving for A520 on 3 nm.

5. X925 Micro‑architecture Analysis

5‑1 Front‑end

Pre‑decode fetch buffer doubled from 32 B to 64 B, increasing instruction availability.

Branch predictor improvements, including “fold‑out unconditional direct branches” to reduce stalls.

L1 instruction bandwidth increased from 32 B to 64 B; iTLB capacity doubled.

5‑2 Back‑end

Added one LD‑AGU unit (now 2 ST + 4 LD versus 2 ST + 3 LD in X4).

L1 data bandwidth doubled to 64 B; L2 cache grew from 2 MiB to 3 MiB.

Reorder Buffer (ROB) size doubled from 384 to 768 entries, surpassing Apple’s A17 ROB and boosting out‑of‑order execution by 25‑40 %.

5‑3 Execution Units

SIMD/FP units increased from 4 to 6 lanes.

Integer ALU now supports more complex two‑cycle operations.

Integer multiply units rose from 2 to 4; FP compare units from 1 to 2.

5‑4 Performance

Running at 3.8 GHz, X925 shows up to 36 % IPC improvement over its predecessor, with roughly 25 % coming from higher clock and ~11 % from micro‑architectural changes. Geekbench 6 data suggests about 15 % gain from architecture alone.

6. A725 and A520 Highlights

A725’s ROB size is larger than the previous 192 entries (exact size undisclosed).

L2 cache options expanded to 1 MiB, up from a maximum of 512 KB in A720.

Both cores benefit from 3 nm‑specific power‑efficiency optimisations; A520 sees ~15 % efficiency improvement despite unchanged architecture.

7. Conclusion

Arm’s 2024 X925 and A725 cores demonstrate incremental but meaningful gains on the 3 nm node, especially in clock speed, ROB size, and cache bandwidth. However, competition from Apple’s custom A‑series and Qualcomm’s Oryon will shape market outcomes.

ARMCPU architectureindustry insightsmicroarchitecture3nmCortex-X925
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.