Artificial Intelligence 10 min read

How Huawei’s Ascend AI Chip Roadmap and Supernode Strategy Aim to Challenge Nvidia

At the 2025 Connect conference Huawei unveiled its Ascend AI chip roadmap and supernode strategy, detailing architectural innovations, ultra‑high‑bandwidth interconnects, open ecosystem initiatives and performance gains that together aim to rival Nvidia’s dominance in AI compute.

Architects' Tech Alliance

Sep 24, 2025

How Huawei’s Ascend AI Chip Roadmap and Supernode Strategy Aim to Challenge Nvidia

1. Innovations in Ascend Chip Architecture

Huawei announced that the Ascend AI chip series will evolve from the 950 series released in 2026 to the 970 series planned for 2028, continuously boosting compute power. The 950 series focuses on inference and recommendation workloads using a low‑cost HBM HiBL 1.0 memory, while the 950DT variant targets training and decoding with HiZQ 2.0 HBM, delivering 4 TB/s bandwidth and supporting low‑precision formats such as FP8 and MXFP4. Future 960 and 970 chips are expected to double performance, reaching up to 2 PFLOPS (FP8) and 4 PFLOPS (FP4), with memory access granularity reduced from 512 bytes to 128 bytes.

2. Supernode Technology Breakthroughs

The Atlas 950 SuperPoD supernode supports 8 192 Ascend cards interconnected via Huawei’s self‑developed “Lingqu” (UnifiedBus) protocol, providing 16.3 PB/s bandwidth, nanosecond‑level latency and liquid‑cooling to manage power consumption above 100 kW. This architecture enables unified resource scheduling across thousands of cards, achieving training throughput of 4.91 M TPS and inference throughput of 19.6 M TPS, a significant improvement over previous generations. The technology is also extended to general‑purpose computing with the TaiShan 950 supernode, which integrates Kunpeng CPUs and GaussDB to replace traditional mainframes.

3. Open Interconnect Protocol (Lingqu) Ecosystem

The Lingqu 2.0 protocol overcomes long‑distance high‑reliability challenges, extending interconnect distances beyond 200 m while delivering TB‑level bandwidth and low latency, surpassing traditional RoCE solutions. Huawei has opened the specification to third‑party hardware developers, fostering an ecosystem that includes deep integration with PyTorch, TensorFlow and the MindStudio toolchain, thereby narrowing the developer experience gap with CUDA.

4. Software Open‑Source and Co‑Optimization

Huawei’s AI strategy centres on the open‑source CANN software stack, which bridges AI frameworks to Ascend hardware and supports custom operator development via the AscendC language. Collaboration with DeepSeek showcases the use of Mixture‑of‑Experts (MoE) architectures and dynamic expert scheduling on Ascend hardware, markedly improving large‑model inference throughput, reducing decoding latency and lowering memory usage, though stability for dynamic input shapes remains an open challenge.

5. Industry Impact and Future Challenges

The announced roadmap positions Huawei’s domestic AI compute to compete on the global stage, offering advantages in performance, energy efficiency and autonomous control. Nevertheless, the success of this strategy depends on the maturity of the ecosystem, the quality of technical documentation, benchmark availability and the ability to navigate international competition and technology‑iteration risks.

hardware architecture Huawei AI chips supernode open interconnect

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.