How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810
Huawei’s Da Vinci AI architecture, introduced with the Kirin 810 SoC, combines a 3D Cube matrix‑multiply engine, vector and scalar units, and flexible scaling to deliver high‑performance, energy‑efficient AI compute across devices from low‑power IoT to high‑end cloud servers.
Huawei introduced the Kirin 810 SoC featuring its self‑designed Da Vinci NPU, which achieved top‑3 results on the AI Benchmark released by ETH Zurich, highlighting the chip’s leading edge AI performance.
Why the Da Vinci Architecture?
Huawei forecasts 400 billion smart terminals by 2025, with AI assistants reaching 90 % penetration, making AI a universal technology that will dramatically boost productivity across all industries.
Design of the Da Vinci Architecture
The Da Vinci architecture is a purpose‑built AI compute framework that delivers high compute density, energy efficiency, and flexible, cut‑table resources. It features a 3D Cube matrix‑multiply engine, a Vector unit for diverse operations, and a Scalar unit for control‑flow tasks.
Fundamental AI Data Types
Scalar – a single number.
Vector – a one‑dimensional ordered array.
Matrix – a two‑dimensional ordered array.
Tensor – an n‑dimensional ordered array.
Matrix multiplication is the core of AI workloads; accelerating it directly improves overall AI throughput.
Core Units of Da Vinci
3D Cube Matrix‑Multiply Unit – performs massive MAC operations in a single cycle, using buffers L0A/B/C for data staging.
Vector Unit – handles a wide range of vector‑type calculations beyond matrix multiplication.
Scalar Unit – acts as a small CPU for loop control, branching, address calculation, and basic arithmetic.
Advantages of the 3D Cube
For an N × N matrix multiplication, a traditional 1‑D MAC array needs N² cycles, a 2‑D array needs N cycles, while the 3D Cube completes the operation in a single cycle, dramatically reducing latency and increasing utilization.
Impact on Kirin 810
The Kirin 810, the first SoC to integrate the Da Vinci NPU, delivers industry‑leading FP16 and INT8 performance, enabling rich AI features on devices such as the Nova 5, Nova 5i Pro, and Honor 9X.
It also supports Huawei’s HiAI ecosystem with an open IR format and over 240 operators, facilitating rapid model conversion and deployment across cloud, edge, and mobile platforms.
Scalable Across Scenarios
Thanks to its modular design, Da Vinci can be deployed from tens of milliwatts on IoT devices (Ascend‑Nano) to hundreds of watts for data‑center training (Ascend‑Max), covering edge, server, and cloud workloads.
Unified Development Benefits
Developers can write operators once and run them on any Da Vinci‑based platform, reducing migration costs and ensuring consistent performance across devices.
Future Outlook
With its high performance and flexibility, Da Vinci is expected to power AI in smart cities, autonomous driving, retail, robotics, industrial manufacturing, and cloud AI services, making AI ubiquitous.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
