Overview of Huawei Ascend AI Full‑Stack Architecture, CANN, and AscendCL

This article introduces Huawei's Domain‑Specific Ascend AI architecture, detailing its four‑layer full‑stack design, the five‑layer abstract and three‑layer logical structures of the CANN heterogeneous computing framework, and the AscendCL programming interface with its advantages and application scenarios.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Overview of Huawei Ascend AI Full‑Stack Architecture, CANN, and AscendCL

Unlike general‑purpose CPUs and GPUs, the DaVinci‑based Ascend AI chips implement a Domain‑Specific Architecture (DSA) optimized for common AI workloads, with AI Cores handling scalar, vector, and tensor operations.

1. Ascend AI Full‑Stack Architecture

The stack consists of four major parts:

Application enablement layer – software and hardware for model deployment such as APIs, SDKs, platforms, and model libraries.

AI framework layer – training frameworks like MindSpore, TensorFlow, and PyTorch.

Heterogeneous computing architecture – low‑level, general‑purpose compute framework that accelerates the upper AI frameworks and supports multiple frameworks on hardware.

Compute hardware layer – the physical AI cores and devices that provide the raw computational power.

2. Heterogeneous Computing Architecture CANN

2.1 Five‑Layer Abstract Architecture

CANN (Compute Architecture for Neural Networks) is built on the DaVinci architecture and offers a multi‑level programming interface that enables low‑threshold, high‑performance AI development across vision, NLP, recommendation, and robotics domains.

2.2 Three‑Layer Logical Architecture

1. Application Layer

Includes various Ascend‑based applications and developer tools for algorithm development and optimization.

Inference Applications

Develop inference apps using the AscendCL API.

AI Frameworks

Support for TensorFlow, Caffe, MindSpore, and third‑party frameworks.

Model Miniaturization Tools

Quantize models to accelerate inference.

AutoML Tools

MindSpore‑based AutoML searches for hardware‑aware networks to fully exploit Ascend performance.

Acceleration Libraries

AscendCL‑based libraries (currently BLAS) for custom acceleration.

MindStudio

Integrated IDE for offline model conversion, debugging, custom operator development, performance tuning, and system diagnostics.

2. Chip Enablement Layer

Provides external capability exposure and graph‑driven workflow control.

AscendCL Computing Language Library

Open programming framework offering Device/Context/Stream management, memory handling, model/operator loading and execution, media processing, and graph management APIs.

Graph Optimization and Compilation

Unified IR interfaces for TensorFlow, Caffe, MindSpore, etc., with Graph Engine, Fusion Engine, AICPU Engine, and HCCL support. Graph Engine – central control of graph compilation and execution. Fusion Engine – manages operator fusion rules. AICPU Engine – handles AICPU operator information. HCCL – manages HCCL operator information.

Operator Compilation and Libraries

TBE – toolchain for compiling and developing custom operators. Operator Libraries – neural‑network acceleration libraries.

Digital Vision Pre‑Processing

Provides video encode/decode (VENC/VDEC), JPEG encode/decode, PNG decode, and VPC preprocessing.

Execution Engine

Runtime – resource management for neural‑network tasks. Task Scheduler – manages and schedules graph task sequences.

3. AscendCL Computing Language Interface

3.1 AscendCL Overview

AscendCL (Ascend Computing Language) is an open programming framework that wraps low‑level Ascend services, providing unified APIs for device and memory management, model loading/execution, operator execution, and image/video preprocessing, enabling deep‑learning inference, graph processing, and custom operator acceleration on the CANN platform.

3.2 Advantages

1. Highly abstract – consolidates operator compilation, loading, and execution into a small set of APIs, reducing complexity. 2. Backward compatible – programs compiled with older versions continue to run on newer releases. 3. Zero‑awareness of specific chips – a single AscendCL interface works uniformly across all Ascend processors.

3.3 Main Application Scenarios

1. Application development – direct use of AscendCL to build image classification, object detection, and other AI apps. 2. Third‑party framework integration – frameworks can call AscendCL to leverage Ascend hardware. 3. Third‑party library creation – developers can wrap AscendCL to provide resource management, model execution, and other services.

3.4 Layered Capability Exposure

Model Loading Capability – loads OM models via AscendCL. Operator Capability – operator functions are implemented in CANN but exposed through AscendCL. Runtime Capability – abstracts device, memory, and event resources for applications.
ArchitectureAIdeep learninghardwareAscendCANNProgramming Interface
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.