Artificial Intelligence 15 min read

How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Rivals

Huawei’s Ascend 910D AI chip boasts a revamped architecture, 320 TFLOPS half‑precision performance, liquid‑cooling with only 350 W power, and 4 TB/s inter‑chip bandwidth, and the article compares these advantages to previous 910 models, domestic competitors and leading foreign chips such as Nvidia H100, highlighting performance, cost and ecosystem benefits.

Architects' Tech Alliance

Sep 7, 2025

How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Rivals

Overview

In the AI‑chip domain, Huawei’s Ascend 910D has attracted attention for its architectural upgrades, higher half‑precision performance (320 TFLOPS), liquid‑cooling that keeps full speed at 45 °C with only 350 W power, and a 4 TB/s inter‑chip bandwidth that boosts cluster density five‑fold.

Advantages over previous 910 models

Architecture and compute

The 910D uses an optimized in‑house architecture that reduces redundant circuitry by about 30 %, raising half‑precision throughput to 320 TFLOPS, far above the earlier 910B.

Cooling and power

It adopts advanced liquid‑cooling, maintaining full speed at 45 °C while consuming only 350 W, whereas the 910C relies on traditional cooling and draws more power under load.

Cluster interconnect

Each 910D can move 4 TB of data per second, enabling a five‑times increase in compute density for multi‑chip clusters, which shortens training cycles for large language models such as Wenxin Yi.

Comparison with domestic competitors

Cambricon Sophon 370 : 256 TOPS INT8, lower BF16 performance; 910D targets 2000 BF16 TFLOPS, surpassing it.

Yunshui T10 (Suiyuan Tech) : Optimized for cloud inference; 910D focuses on large‑model training and high‑efficiency inference.

Kunlun AI Accelerator R200 : Designed for deep‑learning workloads; 910D offers larger scale and AI‑specific optimizations.

Tianjin AI Chip Tian‑Gai 100 : General‑purpose GPGPU; 910D provides higher AI‑training performance.

Moorethread MTT S4000 : Lower performance and higher power; 910D delivers better performance‑per‑watt.

Birren Tech BR106B/BR106C : 300 W/150 W peak power; 910D’s liquid‑cooling and chiplet design give superior power efficiency and bandwidth.

Alibaba Pingtouge Yitian 710 : ARM‑based server chip for cloud; 910D is specialized for AI training and inference.

HaiGuang K100 AI version : 49 TFLOPS FP32, 192 TFLOPS BF16/FP16; 910D aims at higher BF16 performance comparable to Nvidia H100.

Comparison with foreign mainstream products

Performance

Against Nvidia H100, 910D’s half‑precision 320 TFLOPS exceeds H100’s 256 TFLOPS, while its 350 W power consumption is half of H100’s 700 W. Its 5‑chip “super‑computer” configuration raises compute density five‑fold, cutting training time for models like Wenxin Yi by 27 % and speeding up autonomous‑driving model iteration by 1.8×.

Cost

At roughly ¥145,000, the 910D is about 40 % cheaper than the H100 (≈¥240,000). Its liquid‑cooling system also reduces hardware procurement costs by ~20 %.

Domestic optimization

The chip is tuned for Chinese NLP tasks, achieving 12 % higher accuracy on classical Chinese text translation than H100, and benefits from local supply chains and security requirements.

Specific foreign chip comparisons

Nvidia H100 : Higher raw BF16 performance but larger power draw; 910D competes via CloudMatrix 384 super‑node technology.

Nvidia A100 : Strong performance; 910D offers AI‑specific architecture and chiplet integration.

AMD Instinct MI300 : Heterogeneous CPU‑GPU design; 910D focuses on AI‑centric workloads.

Intel Habana Gaudi2 : ASIC for AI training; 910D targets larger scale with chiplet and liquid‑cooling.

Google TPU v4 : ASIC for massive matrix ops; 910D provides comparable AI training performance within Chinese data centers.

Cerebras Wafer Scale Engine (WSE‑3) : Wafer‑scale chip; 910D achieves high performance through multi‑chip integration.

Graphcore Bow IPU : 3‑D stacked IPU; 910D offers similar parallelism via chiplet design.

Tenstorrent Grayskull/Elden : RISC‑V based AI chips; 910D provides comparable performance for large‑model training.

Overall, the Ascend 910D’s architectural upgrades, efficient cooling, high‑speed interconnect, cost advantage, and ecosystem integration position it as a strong contender in both domestic and global AI‑chip markets.

deep learning hardware Performance comparison AI chip Huawei Ascend 910D

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.