How DeepSeek V4 and Huawei Ascend 950 Redefined China’s AI Chip Landscape

The article details how DeepSeek V4 became the first top‑level large model to run on Huawei's Ascend 950 PR chip, delivering up to 2.87× the performance of Nvidia H20, cutting inference cost by up to 90%, and spurring a booming domestic AI‑chip ecosystem and supply‑chain surge.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
How DeepSeek V4 and Huawei Ascend 950 Redefined China’s AI Chip Landscape

DeepSeek V4 on Ascend 950 PR – a breakthrough

On 24 April the AI community was shocked when DeepSeek V4 announced full support for Huawei Ascend 950 PR, making it the world’s first top‑tier large model to run on this domestic chip and breaking the reliance on foreign GPUs.

Ascend 950 series roadmap and specifications

2026 Q1: Ascend 950 PR – first chip with Huawei‑designed HBM, FP8 precision, 1 PFLOPS (1000 TFLOPS) FP8 compute, 2 TB/s interconnect bandwidth.

2026 Q4: Ascend 950 DT – same core architecture with 144 GB memory and 4 TB/s bandwidth for training and decode workloads.

2027 Q4: Ascend 960 – double compute to 2 PFLOPS (FP8), 288 GB memory, 9.6 TB/s bandwidth, targeting Nvidia H200.

2028 Q4: Ascend 970 – compute reaches 4 PFLOPS (FP8) / 8 PFLOPS (FP4), N+3 process, >30 % efficiency gain over 910C.

Performance versus Nvidia H20

FP4 compute peaks at 1.56 PFLOPS (1560 TFLOPS).

Self‑developed HBM memory eliminates foreign dependency.

Multimodal inference speed improves by 60 %.

Overall performance equals 2.87 × Nvidia H20.

TPOT latency drops to 10‑20 ms, single‑card throughput reaches 4700 TPS, and inference cost is cut by up to 90 %.

Key technical innovations

Complete migration from CUDA to Huawei CANN Next, including full‑stack code port, operator rewrite, and automatic operator fusion.

Chiplet‑based dual‑chip stacking with 2.5D/3D advanced packaging lifts performance to flagship levels without 5 nm/3 nm processes.

Self‑developed HBM solutions: HiBL 1.0 (1.4 TB/s) for PR and HiZQ 2.0 (4 TB/s) for DT.

Lingqu 2.0 interconnect provides 2 TB/s bandwidth, supports clusters up to 8192 cards, and achieves 80‑85 % energy‑efficiency.

CANN Next enables one‑click PyTorch adaptation and dedicated MoE model optimizations.

Industry chain impact

The Ascend 950 surge has energized the entire supply chain:

Upstream: SMIC (foundry), Changdian/ Tongfu Microelectronics (advanced packaging), DeepSouth Circuits (PCB & substrates), Huahai Chengke (HBM packaging).

Midstream: Huafeng Technology (224 G high‑speed connectors), Yihua Co. (800 G connectors), Guangxun/Huagong (400‑800 G optical modules), liquid‑cooling providers (Shenling, GaoLan, Chuanrun).

Downstream: Huawei‑Kun Server (market leader), Tuowei Information (20 万 units/year), Digital China (multi‑billion procurement), and ecosystem partners such as iFlytek.

Orders exploded after the announcement, with production starting in March and revenue projected to rise at least 60 % for the year.

Conclusion – a perfect AI‑chip and model pairing

DeepSeek V4 and Ascend 950 form a tightly coupled “god‑level CP”, offering full FP4/FP8 support, KV‑Cache sliding‑window compression, V4‑Flash 10 ms latency, and massive context handling, positioning China as a leader in AI compute performance, cost, and supply‑chain independence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI inferenceindustry analysisFP8DeepSeek V4AI chip performanceCANN NextHuawei Ascend 950
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.