Inside Huawei’s Ascend AI Processor: Architecture, Performance, and Design Secrets
This article provides a detailed technical overview of Huawei's Ascend AI processors, covering the Da Vinci architecture, core components such as AI Core and DVPP, the chiplet‑based designs of the Ascend 910 and 310 models, and the specific optimizations for high‑performance convolution and cloud‑edge workloads.
Ascend AI Processor
Huawei introduced the Ascend AI processor in 2018 as the first product built on its proprietary Da Vinci architecture, aiming to deliver a full‑stack, full‑scenario solution that spans cloud, edge, and device levels. The processor emphasizes high energy‑efficiency and features a 3D Cube matrix compute unit capable of mixed‑precision operations.
AI Processor Architecture
The SoC integrates several specialized blocks: AI Core, AI CPU, multi‑level on‑chip caches/buffers, and a Digital Vision Pre‑Processing module (DVPP). All components communicate via the CHI ring bus, ensuring data consistency and high‑bandwidth sharing. Convolution acceleration is achieved through tight hardware‑software co‑design, combining matrix compute units with large on‑chip buffers to shorten data paths and reduce latency.
Ascend 910
The Ascend 910 targets cloud‑side training and inference. Its architecture consists of six chiplets: one compute chiplet (32 Da Vinci Cores, 16 CPU Cores, 4 DVPP units), one I/O chiplet, and four HBM chiplets delivering a total memory bandwidth of 1.2 TB/s.
High compute density: each core can issue 16 FP16 MAC operations per cycle (4096 MAC per instruction).
High load/store bandwidth: HBM memory complements DDR to satisfy the massive data movement of back‑propagation.
100 Gbps NIC with RoCE v2 support enables multi‑node, multi‑card clusters.
DVPP can decode up to 128 simultaneous 1080p video streams, catering to inference workloads.
Ascend 310
The Ascend 310 is designed for edge inference scenarios such as smart cities, retail, robotics, and industrial automation. Its architecture mirrors the 910 but with fewer custom IP blocks and richer peripheral interfaces, integrating Da Vinci Core, DVPP, and LPDDR4 memory.
Summary and Reflections
Innovation: Ascend AI processors leverage the Da Vinci architecture to provide a unified cloud‑edge solution with high energy‑efficiency and a powerful 3D Cube matrix unit.
Architecture: The SoC integrates AI Core, AI CPU, multi‑level caches, and DVPP, all linked by the CHI ring bus for coherent data sharing.
Convolution acceleration: Hardware‑software co‑optimization using matrix units and flexible data paths enables fast, low‑latency convolution for diverse neural network structures.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
