How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Huawei’s Da Vinci AI architecture, introduced with the Kirin 810 SoC, combines a 3D Cube matrix‑multiply engine, vector and scalar units, and flexible scaling to deliver high‑performance, energy‑efficient AI compute across devices from low‑power IoT to high‑end cloud servers.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Huawei introduced the Kirin 810 SoC featuring its self‑designed Da Vinci NPU, which achieved top‑3 results on the AI Benchmark released by ETH Zurich, highlighting the chip’s leading edge AI performance.

Why the Da Vinci Architecture?

Huawei forecasts 400 billion smart terminals by 2025, with AI assistants reaching 90 % penetration, making AI a universal technology that will dramatically boost productivity across all industries.

Design of the Da Vinci Architecture

The Da Vinci architecture is a purpose‑built AI compute framework that delivers high compute density, energy efficiency, and flexible, cut‑table resources. It features a 3D Cube matrix‑multiply engine, a Vector unit for diverse operations, and a Scalar unit for control‑flow tasks.

Fundamental AI Data Types

Scalar – a single number.

Vector – a one‑dimensional ordered array.

Matrix – a two‑dimensional ordered array.

Tensor – an n‑dimensional ordered array.

Matrix multiplication is the core of AI workloads; accelerating it directly improves overall AI throughput.

Core Units of Da Vinci

3D Cube Matrix‑Multiply Unit – performs massive MAC operations in a single cycle, using buffers L0A/B/C for data staging.

Vector Unit – handles a wide range of vector‑type calculations beyond matrix multiplication.

Scalar Unit – acts as a small CPU for loop control, branching, address calculation, and basic arithmetic.

Advantages of the 3D Cube

For an N × N matrix multiplication, a traditional 1‑D MAC array needs N² cycles, a 2‑D array needs N cycles, while the 3D Cube completes the operation in a single cycle, dramatically reducing latency and increasing utilization.

Impact on Kirin 810

The Kirin 810, the first SoC to integrate the Da Vinci NPU, delivers industry‑leading FP16 and INT8 performance, enabling rich AI features on devices such as the Nova 5, Nova 5i Pro, and Honor 9X.

It also supports Huawei’s HiAI ecosystem with an open IR format and over 240 operators, facilitating rapid model conversion and deployment across cloud, edge, and mobile platforms.

Scalable Across Scenarios

Thanks to its modular design, Da Vinci can be deployed from tens of milliwatts on IoT devices (Ascend‑Nano) to hundreds of watts for data‑center training (Ascend‑Max), covering edge, server, and cloud workloads.

Unified Development Benefits

Developers can write operators once and run them on any Da Vinci‑based platform, reducing migration costs and ensuring consistent performance across devices.

Future Outlook

With its high performance and flexibility, Da Vinci is expected to power AI in smart cities, autonomous driving, retail, robotics, industrial manufacturing, and cloud AI services, making AI ubiquitous.

AImatrix multiplicationNPUDa Vinci architecture3D CubeKirin 810
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.