How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Leaders

The article examines Huawei's Ascend 910D AI processor, highlighting its architectural upgrades, liquid‑cooling power efficiency, and 4 TB/s inter‑chip bandwidth, then compares its performance, cost and ecosystem advantages against domestic rivals such as Cambricon and Kunlun and against foreign powerhouses like NVIDIA H100, AMD MI300 and Google TPU v4.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Leaders

Introduction

In the artificial‑intelligence chip field, Huawei’s Ascend 910D has drawn significant attention as the newest member of the Ascend series, offering distinct technical advantages over earlier 910 models and competitive positioning against both domestic and international AI processors.

1. Advantages of Ascend 910D over the 910 series

(1) Architecture and compute upgrade

The 910D uses an optimized self‑designed architecture that reduces redundant circuitry by about 30%, boosting half‑precision performance to 320 TFLOPS—far above the earlier 910B—enabling faster large‑matrix and complex neural‑network training.

(2) Advanced cooling and power management

Equipped with liquid‑cooling technology, the 910D can run at full speed at 45 °C while consuming only 350 W, whereas the 910C relies on traditional cooling and exhibits higher power draw under load.

(3) Cluster interconnect performance boost

The chip can move 4 TB of data per second between units, raising multi‑chip cluster compute density by five times and shortening training cycles for large models such as Wenxin Yi.

2. Comparison with domestic counterparts

Cambricon SiMa 370 : 256 TOPS (INT8) peak, MLUarch03 architecture, mature software ecosystem; 910D targets 2000 BF16 TFLOPS, surpassing SiMa 370 in half‑precision.

CloudSui T10 (Suiyuan Tech) : Designed for cloud inference with low latency; 910D focuses on large‑model training and high‑efficiency inference.

Kunlun Chip AI Accelerator R200 : Built for deep‑learning workloads on cloud and edge; 910D offers larger scale compute and AI‑specific optimizations.

Hygon K100 AI Version : 49 T (FP32) and 192 T (BF16/FP16) peak; 910D aims for higher BF16 performance comparable to NVIDIA H100.

MooreThread MTT S4000 : Lower performance (≈1/3 of 910B) and higher power; 910D provides better performance‑per‑watt.

Birentech BR106B/BR106C : 300 W/150 W peak power; 910D’s chiplet design and liquid cooling deliver superior power efficiency.

Alibaba Pingtouge Yitian 710 : 5 nm ARM‑based server chip for cloud inference; 910D is specialized for AI training and inference with dedicated AI architecture.

Hygon K100 AI Version : 49 T (FP32) and 192 T (BF16/FP16) peak; 910D’s half‑precision target exceeds it.

3. Comparison with foreign mainstream products

(1) Performance parameters

Against NVIDIA H100, the 910D’s 320 TFLOPS half‑precision surpasses H100’s 256 TFLOPS, while its 350 W power draw is half of H100’s 700 W. Multi‑chip integration raises compute density fivefold, cutting training time for large language models by roughly 27 %.

(2) Cost advantage

At about ¥145,000, the 910D is roughly 40 % cheaper than the H100 (≈¥240,000). Its liquid‑cooling system also reduces hardware procurement costs by around 20 %.

(3) Local optimization and adaptability

The chip is tuned for Chinese NLP tasks, delivering a 12 % higher accuracy on classical Chinese text translation than H100, and benefits from domestic supply‑chain security and compliance with local data‑security requirements.

(4) Overall foreign AI‑chip capability comparison

NVIDIA H100 : Hopper architecture, 2000 BF16 TFLOPS, mature CUDA ecosystem; 910D matches BF16 performance with chiplet design and CloudMatrix 384 super‑node technology.

NVIDIA A100 : Ampere architecture, strong AI training capability; 910D’s custom architecture focuses on AI‑specific optimizations and power efficiency.

AMD Instinct MI300 series : CPU‑GPU heterogeneous chip for generative AI; 910D offers comparable AI‑focused compute with a self‑designed architecture.

Intel Habana Gaudi 2 : ASIC for AI training, targeting NVIDIA A100 performance; 910D provides similar training power with advanced cooling and chiplet interconnect.

Google TPU v4 : ASIC for large‑scale matrix ops in cloud AI services; 910D is deployed in Chinese data centers for large‑model training.

Cerebras WSE‑3 : Wafer‑scale engine delivering ultra‑high FLOPS; 910D achieves comparable AI training performance through multi‑chip integration.

Graphcore Bow IPU : 3D‑stacked architecture for parallel AI workloads; 910D’s chiplet and interconnect design serve similar high‑performance AI scenarios.

Tenstorrent Grayskull/Elden : RISC‑V based scalable AI chips; 910D’s self‑designed architecture provides comparable scalability for large‑model training.

Overall, Huawei’s Ascend 910D leverages architectural innovation, efficient cooling, high‑speed interconnect, and localized ecosystem support to position itself as a strong competitor in both domestic and global AI‑chip markets.

Performance comparisonAI chipHuaweiAI hardwareAscend 910D
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.