Overview of Huawei Ascend AI Full‑Stack Architecture, CANN, and AscendCL
This article introduces Huawei's Domain‑Specific Ascend AI architecture, detailing its four‑layer full‑stack design, the five‑layer abstract and three‑layer logical structures of the CANN heterogeneous computing framework, and the AscendCL programming interface with its advantages and application scenarios.
Unlike general‑purpose CPUs and GPUs, the DaVinci‑based Ascend AI chips implement a Domain‑Specific Architecture (DSA) optimized for common AI workloads, with AI Cores handling scalar, vector, and tensor operations.
1. Ascend AI Full‑Stack Architecture
The stack consists of four major parts:
Application enablement layer – software and hardware for model deployment such as APIs, SDKs, platforms, and model libraries.
AI framework layer – training frameworks like MindSpore, TensorFlow, and PyTorch.
Heterogeneous computing architecture – low‑level, general‑purpose compute framework that accelerates the upper AI frameworks and supports multiple frameworks on hardware.
Compute hardware layer – the physical AI cores and devices that provide the raw computational power.
2. Heterogeneous Computing Architecture CANN
2.1 Five‑Layer Abstract Architecture
CANN (Compute Architecture for Neural Networks) is built on the DaVinci architecture and offers a multi‑level programming interface that enables low‑threshold, high‑performance AI development across vision, NLP, recommendation, and robotics domains.
2.2 Three‑Layer Logical Architecture
1. Application Layer
Includes various Ascend‑based applications and developer tools for algorithm development and optimization.
Inference Applications
Develop inference apps using the AscendCL API.
AI Frameworks
Support for TensorFlow, Caffe, MindSpore, and third‑party frameworks.
Model Miniaturization Tools
Quantize models to accelerate inference.
AutoML Tools
MindSpore‑based AutoML searches for hardware‑aware networks to fully exploit Ascend performance.
Acceleration Libraries
AscendCL‑based libraries (currently BLAS) for custom acceleration.
MindStudio
Integrated IDE for offline model conversion, debugging, custom operator development, performance tuning, and system diagnostics.
2. Chip Enablement Layer
Provides external capability exposure and graph‑driven workflow control.
AscendCL Computing Language Library
Open programming framework offering Device/Context/Stream management, memory handling, model/operator loading and execution, media processing, and graph management APIs.
Graph Optimization and Compilation
Unified IR interfaces for TensorFlow, Caffe, MindSpore, etc., with Graph Engine, Fusion Engine, AICPU Engine, and HCCL support. Graph Engine – central control of graph compilation and execution. Fusion Engine – manages operator fusion rules. AICPU Engine – handles AICPU operator information. HCCL – manages HCCL operator information.
Operator Compilation and Libraries
TBE – toolchain for compiling and developing custom operators. Operator Libraries – neural‑network acceleration libraries.
Digital Vision Pre‑Processing
Provides video encode/decode (VENC/VDEC), JPEG encode/decode, PNG decode, and VPC preprocessing.
Execution Engine
Runtime – resource management for neural‑network tasks. Task Scheduler – manages and schedules graph task sequences.
3. AscendCL Computing Language Interface
3.1 AscendCL Overview
AscendCL (Ascend Computing Language) is an open programming framework that wraps low‑level Ascend services, providing unified APIs for device and memory management, model loading/execution, operator execution, and image/video preprocessing, enabling deep‑learning inference, graph processing, and custom operator acceleration on the CANN platform.
3.2 Advantages
1. Highly abstract – consolidates operator compilation, loading, and execution into a small set of APIs, reducing complexity. 2. Backward compatible – programs compiled with older versions continue to run on newer releases. 3. Zero‑awareness of specific chips – a single AscendCL interface works uniformly across all Ascend processors.
3.3 Main Application Scenarios
1. Application development – direct use of AscendCL to build image classification, object detection, and other AI apps. 2. Third‑party framework integration – frameworks can call AscendCL to leverage Ascend hardware. 3. Third‑party library creation – developers can wrap AscendCL to provide resource management, model execution, and other services.
3.4 Layered Capability Exposure
Model Loading Capability – loads OM models via AscendCL. Operator Capability – operator functions are implemented in CANN but exposed through AscendCL. Runtime Capability – abstracts device, memory, and event resources for applications.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
