Inside Huawei Ascend 910: Architecture, Performance, and Future Roadmap
The article provides a detailed technical analysis of Huawei's Ascend 910 AI processor, covering its Da Vinci architecture, hardware specifications, benchmark results, software ecosystem, application scenarios, and product roadmap, while also clarifying key terminology for readers.
Product Overview
Huawei Ascend 910 is a high‑performance AI processor based on the self‑designed Da Vinci architecture, fabricated with 7nm+ EUV. It targets data‑center AI training, large‑scale distributed training, HPC‑AI convergence, and cloud AI acceleration.
Key Features
32 Da Vinci cores delivering 256 TFLOPS FP16 (512 TOPS INT8)
Actual power consumption 310 W (design 350 W)
7nm+ EUV process for high transistor density
Built‑in model protection and privacy‑preserving computation
Deep integration with MindSpore for end‑edge‑cloud unified stack
Technical Specifications
Architecture: Da Vinci (3D‑Cube)
Process: 7nm+ EUV
Compute precision: FP16 256 TFLOPS / INT8 512 TOPS
Cores: 32 Da Vinci cores
Power: Design 350 W, measured 310 W
Video decode: 128‑channel full‑HD (H.264/H.265)
Interconnect: HCCS 240 Gbps, PCIe, RoCE
Compute Architecture
3D‑Cube Matrix‑Multiply Unit
Per‑cycle 4096 multiply‑add operations
32 Cube engines work in parallel, delivering 256 TFLOPS
Two orders of magnitude performance improvement over CPU/GPU for matrix ops
Vector Unit
Custom compute instructions for element‑wise and non‑matrix workloads
Scalar Unit
Lightweight CPU‑like core for control flow and basic arithmetic
Performance
Benchmark Results
ResNet‑50 training: ~1802 images/s, ~2× faster than a mainstream GPU+TensorFlow setup (965 images/s)
Compute efficiency matches advertised FP16 performance while staying under power budget
Compute density exceeds NVIDIA Tesla V100 and Google TPU v3
Cluster Performance
One Ascend cluster contains 1024 Ascend 910 chips
Total compute reaches 256 PFLOPS (peta‑FLOPS)
Outperforms NVIDIA DGX‑2 and Google TPU clusters in throughput and energy efficiency
Software Ecosystem
Full‑Stack AI Framework
Deep integration with MindSpore; developer code reduction ~20 % and overall efficiency gain ~50 %
Automatic source‑to‑source differentiation and distributed training support
Operator Library & Toolchain
CANN operator library provides high‑performance AI operators (productivity boost ~3×)
TensorEngine offers a unified DSL for automatic operator optimization and generation
ModelArts PaaS platform handles >4 000 daily training jobs
Application Scenarios
Large‑scale model training (trillion‑parameter models, NLP, CV)
Cloud AI services (Huawei Cloud EI platform, 59 AI services, 159 functions)
Industry AI such as medical imaging analysis, financial risk modeling, industrial quality inspection
Scientific computing (molecular dynamics, climate prediction, other HPC workloads)
Product Roadmap
First generation (2018‑2020): Ascend 310 (edge inference, 12 nm, 16 TOPS INT8, 8 W) and Ascend 910 (data‑center training, 7 nm, 256 TFLOPS FP16, 310 W)
Second generation (2021‑2023): Ascend 910B (7 nm+ EUV, 376 TFLOPS FP16) and Ascend 310B (multimodal edge inference, MindSpore Lite)
Third generation (2024‑2025): Ascend 910C (384‑chip node, >3 TB/s memory bandwidth, supports trillion‑parameter models) and Ascend 320 (next‑gen edge chip, 5 nm, 50 % better energy efficiency)
Future (2026+): Ascend 920 (3 nm, target >1 PFLOPS FP16, FP8 support, dynamic sparsity, MoE‑friendly)
Technical Advantages Summary
Leading compute density: 256 TFLOPS FP16
Best‑in‑class energy efficiency: 310 W for full performance
Innovative 3D‑Cube architecture delivering ultra‑high matrix‑multiply throughput
Full‑stack software co‑optimization with MindSpore
Comprehensive scenario coverage from cloud to edge
Terminology
Da Vinci architecture: Huawei’s heterogeneous AI compute architecture
3D Cube: Dedicated 3‑dimensional matrix‑multiply engine
MindSpore: Huawei’s full‑stack AI framework
CANN: Huawei AI operator library
References
https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650759181&idx=1&sn=4c1cd87ea5b8f4f10d48e4c3b0943c0b&scene=21#wechat_redirect
https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650759048&idx=1&sn=56b892974c8ac18955e14c040138fa2f&scene=21#wechat_redirect
https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650759697&idx=1&sn=d4adefbcb7ad09e5e3805068d2844485&scene=21#wechat_redirect
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
