Fundamentals 12 min read

Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

The article provides a detailed technical overview of Huawei's Kunpeng 920 processor, describing its ARM‑based RISC architecture, chip organization, core and cluster layout, security features, IMU management, and the various subsystems such as IO, interrupt, network, SAS, and PCIe.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

1. Organization of Kunpeng Processor

Chip: a silicon die with large‑scale integration, the common form of CPU.

DIE: the smallest physical unit of a chip; Kunpeng 920 packages three DIEs, two for compute and one for IO.

Die (晶粒): an un‑packaged semiconductor block that becomes an individual integrated circuit after wafer processing.

Core: the actual compute unit, seen as a "core" by the operating system.

Cluster: a group of cores; Kunpeng 920 groups four cores into one cluster, with eight clusters per DIE.

SoC (System on Chip): integrates CPU, RoCE NIC, SAS controller, southbridge, etc., forming a complete system on a single chip.

2. Kunpeng 920 Chip Architecture

One SoC contains three DIEs: two compute DIEs and one IO DIE.

Each compute DIE has 8 clusters; each cluster contains 4 cores, resulting in 64 cores per Kunpeng 920 chip.

Each core in the compute DIE has private L1 and L2 caches, while all cores share an L3 cache.

The IO DIE integrates network and PCIe modules, and the DIEs are interconnected via a high‑speed internal bus.

3. System Security & IMU

Security: supports Secure Boot and Trusted Execution Environment using ARM TrustZone combined with hardware mechanisms.

IMU (Intelligent Management Unit) is an on‑chip management unit that works with BMC to provide data‑center node monitoring, fault pre‑processing, trust root, energy management, and other management functions.

4. Other Subsystems of Kunpeng 920

The processor includes compute, storage, device IO, interrupt, and virtualization subsystems.

Kunpeng 920 contains two CPU DIEs, one IO DIE, and eight DDR4 channels, interconnected by an AMBA bus.

5. IO Subsystem

The IO DIE extends the processor with on‑chip accelerators such as 100 GbE NICs and SAS controllers, and supports PCIe 4.0 devices like NICs and GPUs.

High‑speed devices on the SoC are also PCIe‑based and can be configured via PCIe configuration space.

Subsystems (PCIe, CCIX, Hydra, Network, Storage, HAC, ME) follow industry standards and open‑source compatibility requirements.

6. Interrupt Subsystem

Implements line and message interrupts compatible with ARM GIC specifications.

GIC (Generic Interrupt Controller) provides enable/disable, routing, priority configuration, and AArch64 security/virtualization extensions.

Supports SGI, PPI, SPI, and LPI interrupts.

Allows routing of interrupts to any CPU core.

Offers interrupt priority settings.

Includes AArch64 security and virtualization extensions.

GICv3 introduces message‑based interrupts (LPI) with support via ITS (Interrupt Translation Service) for dynamic routing.

Kunpeng also adopts the MBIGEN (Message‑Based Interrupt Generator) technology.

7. Network Subsystem

Consists of Network ICL and RoCE engine.

Network ICL provides multiple 1 Gbps‑100 Gbps Ethernet controllers, DCB, MAC tables, VLAN filtering, flow tables, and PCIe integration.

RoCE (RDMA over Converged Ethernet) offers low‑latency, low‑CPU‑utilization remote memory access, based on InfiniBand v2.

8. SAS Subsystem

Provides two X8 SAS 3.0 controllers, supporting SAS 2.0/1.0 and SATA 3.0/2.0/1.0.

SAS supports 12 G/6 G/3 G/1.5 G rates; SATA supports 6 G/3 G/1.5 G with auto‑negotiation.

Directly connects up to eight SAS or SATA drives, with optional expander for more disks.

Direct connection: PHY of SAS controller connects straight to device.

Expander connection: devices connect via an expander.

Also includes NOR flash (4 chip‑selects, up to 512 KB), SPI flash (2 chip‑selects, up to 32 MB), and NAND flash (4 chip‑selects).

9. PCIe Subsystem

Supports PCIe GEN1/2/3/4.0, up to 40 lanes, with three PCIe cores (Core0: 16 lanes, Core1: 16 lanes, Core2: 8 lanes).

Each core can act as a Root Port; only Core1 can function as an Endpoint.

Features embedded DMA engines.

Supports SRIS, SR‑IOV, shared virtual memory, CCIX, and Peer‑to‑Peer traffic.

Source: Huawei Cloud BBS

RISCCPU architecturehardware securitySOCKunpeng
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.