Industry Insights 18 min read

How Arm’s Cortex‑X Series Redefines Mobile CPU Performance: A Deep Dive

This article examines Arm’s Cortex‑X processor lineup—from the first‑generation X1 to the latest X3—detailing their micro‑architectural innovations, performance gains, power trade‑offs, and how they compare with Apple’s A‑series, while also looking ahead to the upcoming Cortex‑X4.

OPPO Kernel Craftsman

May 19, 2023

How Arm’s Cortex‑X Series Redefines Mobile CPU Performance: A Deep Dive

1. Introduction

The article introduces Arm’s recent Cortex‑X series, a high‑performance line launched after the successful Cortex‑A family. While Cortex‑A receives incremental yearly upgrades, the X plan adopts aggressive architectural changes to deliver “super‑core” performance for premium mobile, cloud, edge, and HPC devices.

2. Origin of the X Plan

The X program traces back to 2016’s “Build on Cortex” licence, which let customers such as Qualcomm customise Cortex cores (e.g., cache size). In 2020 Arm announced the Cortex‑X initiative, inviting early‑stage partners to co‑design cores and target high‑end platforms.

3. Cortex‑X1: First Generation

Released in May 2020 alongside the Armv8.2‑based Cortex‑A78, Cortex‑X1 offers ~30% higher performance and ~23% better energy efficiency than the preceding Cortex‑A77, at the cost of higher peak power. Its large cache and single‑core focus led to a new 1+3+4 architecture in flagship SoCs.

Key micro‑architectural upgrades over A78 include:

BPU L0 BTB increased from 64 to 96 entries (+50%).

Frontend decode widened from 4‑way to 5‑way.

MOP pipeline expanded from 6 to 8 lanes.

MOP cache doubled from 1.5 KB to 3 KB.

ROB size grew from ~160 to 224 entries.

L1/L2/L3 caches enlarged (64 KB, 256 KB, up to 8 MB).

NEON SIMD units doubled, providing four 128‑bit units.

Load/store buffers increased by 33%.

Despite its performance, the X1’s large die area and power limited its adoption; only Samsung’s Exynos 2100 and Qualcomm’s Snapdragon 888 used it, and both suffered from sub‑optimal 5 nm yields.

4. Cortex‑X2: Second Generation

Launched in May 2021, Cortex‑X2 moves to the Armv9 architecture with SVE2 support and 64‑bit‑only execution. Codenamed “Matterhorn‑ELP,” it continues the trend of offering a single high‑performance core alongside efficiency cores.

Micro‑architectural changes over X1 include:

Branch prediction and fetch decoupled, improving parallelism.

Pipeline depth reduced from 11 to 10 stages; dispatch latency cut from 2 to 1 cycle.

ROB enlarged to 288 entries (+30%).

SVE2 SIMD instructions added.

Bfloat16 support for ML workloads.

Removed AArch32 support.

Load/store buffers increased by 33%.

d‑TLB entries grew from 40 to 48 (+20%).

Arm claims a 16% integer performance uplift and a 2× ML capability increase versus X1. Real‑world chips such as Qualcomm Snapdragon 8 Gen 1 (initially on Samsung 4 nm, later moved to TSMC 4 nm) demonstrated the core’s potential, with the 2022 Snapdragon 8 Gen 2 achieving a 30% power reduction.

5. Cortex‑X3: Third Generation

Released in June 2022, Cortex‑X3 (code‑named “Makalu‑ELP”) stays on Armv9 but introduces more substantial micro‑architectural refinements. Arm reports an 11% IPC gain over X2 and a 22% overall performance increase (including process improvements).

Key enhancements:

MOP cache reduced from 3 KB to 1.5 KB to save SRAM in advanced nodes.

Fetch‑decode width increased from 5‑way to 6‑way (+20% fetch capacity).

ROB entries raised to ~320 (+11%).

Branch predictor L1 BTB expanded to 96 entries and L2 BTB to 24 K entries, with decoupled design reducing misprediction latency by 12.2% and stall cycles by 3%.

Added a dedicated indirect‑branch predictor, cutting conditional‑branch error rate by 6.1%.

Pipeline depth shortened from 10 to 9 stages, mainly by optimizing MOP cache latency.

Integer ALUs increased from 4 to 6, boosting integer throughput.

Load/store bandwidth grew from 24 B to 32 B, plus two extra prefetch units.

6. Comparison with Apple’s A‑Series

The article contrasts Cortex‑X cores with Apple’s custom A‑series (e.g., A14 Firestorm). Apple’s designs feature larger L1/L2 caches and a much bigger ROB, yielding higher single‑core performance but at higher power and area costs. Cortex‑X3 narrows the gap but still trails Apple’s latest A‑cores.

7. Outlook for Cortex‑X4

Arm’s upcoming Cortex‑X4 (code‑named “Hunter‑ELP”) is expected in 2023. Anticipated improvements include further performance scaling, possibly new micro‑architectural tricks, and continued focus on 64‑bit‑only designs. The community looks forward to seeing whether multiple X cores can coexist in a single SoC to challenge Apple’s multi‑core configurations.

References

https://www.anandtech.com/show/15813/arm-cortex-a78-cortex-x1-cpu-ip-diverging

https://fuse.wikichip.org/news/3543/arm-cortex-x1-the-first-from-the-cortex-x-custom-program/

https://en.wikipedia.org/wiki/ARM_Cortex-X1

https://en.wikipedia.org/wiki/ARM_Cortex-X2

https://fuse.wikichip.org/news/6855/arm-unveils-next-gen-flagship-core-cortex-x3/

https://www.techinsights.com/blog/cortex-x3-powers

https://www.hwcooling.net/en/cortex-x3-the-new-fastest-arm-core-architecture-analysis/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ARM performance analysis CPU architecture mobile processors Cortex-X

Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.