Inside Huawei’s Ascend 910C AI Chip: Architecture, Performance Gaps & Strategy
This article translates and expands on analyst Lennart Heim’s X‑platform report, dissecting Huawei’s newly mass‑produced Ascend 910C AI accelerator, its dual‑chip packaging, performance estimates versus NVIDIA’s H100 and upcoming B200, supply‑chain origins, potential domestic production, and the broader strategic impact on China’s AI competitiveness.
This article is a translation and professional interpretation of analyst Lennart Heim’s analysis posted on the X platform.
The mass‑production announcement of Huawei’s Ascend 910C AI accelerator, dubbed China’s strongest AI chip, has sparked intense interest as a symbol of China’s resilience in high‑tech development.
Technical composition of Ascend 910C: clever dual‑chip combination, intriguing architecture
Not a brand‑new architecture, but a clever reuse?
The “C” in Ascend 910C does not denote a critical change; rather, it reflects a clever combination of two existing Ascend 910B chips integrated through advanced packaging, effectively a “fusion” of two dies.
This approach leverages mature process technology, avoiding costly breakthroughs in unknown nodes while achieving performance gains through architectural innovation.
Heim suggests that the chips may have been sourced from TSMC before tighter export controls, implying that Huawei stocked the dies in advance.
Packaging trade‑off: balancing performance and cost
Packaging decisions and their impact
Packaging is a critical factor influencing chip performance, power consumption, and cost, especially for AI accelerators where advanced packaging can provide a competitive edge, as demonstrated by NVIDIA.
Ascend 910C opts for a more mature, lower‑complexity solution: two separate 910B dies placed on individual silicon interposers and connected via an organic substrate.
This packaging results in inter‑die bandwidth that is estimated to be 10‑20 times lower than NVIDIA’s CoWoS or Foveros solutions, creating a notable performance bottleneck.
Performance and specifications: gap with H100 and chase towards B200
Objective assessment: 80% performance claim
Heim estimates the 910C can deliver roughly 800 TFLOPS of FP16 compute and about 3.2 TB/s memory bandwidth, which is approximately 80 % of NVIDIA’s 2022 H100.
The chip’s logical area is about 60 % larger than the H100, indicating lower architectural efficiency.
Generational gap: facing B200, challenge escalates
Compared with NVIDIA’s upcoming B200 series, the 910C lags significantly in key metrics.
Compute performance: roughly three times lower.
Memory bandwidth: about 2.5 times lower, even assuming HBM2E.
Energy efficiency: noticeably behind the B200.
By 2025, Western AI chip production is projected to be at least five times the volume of China’s, with overall compute capacity 10‑20 times greater.
Supply chain and production: mysterious origins and domestic potential
“TSMC stash?” – shocking supply‑chain speculation
Heim hypothesizes that Huawei may have stockpiled up to three million 7 nm Ascend dies from TSMC before export controls tightened.
He also suggests Huawei could have secured a large amount of HBM2E memory, potentially enabling the production of around 1.4 million 910C accelerators, equivalent to the AI compute of one million NVIDIA H100‑class chips.
Domestic 7 nm production?
Heim believes Huawei likely has the capability to produce 910B and 910C dies at the 7 nm node, but large‑scale mass production faces challenges in yield, cost, and stability.
Strategic significance and global AI competition
Performance gap exists, strategic importance not to be underestimated
Despite the performance gap, the 910C’s launch carries symbolic weight, demonstrating China’s determination to close the AI chip gap under export‑control pressure.
China’s ability to concentrate resources could offset raw compute disadvantages, allowing focused advances in AI inference for specific industries such as smart cities, transportation, manufacturing, and security.
Inference first, application breakthrough?
Prioritizing AI inference over massive pre‑training may enable China to achieve commercial leadership in targeted sectors, even if overall compute capacity lags.
However, next‑generation pre‑training will still require massive clusters of tens of thousands of chips, underscoring the continued importance of total compute volume.
Conclusion and outlook
While the Ascend 910C may only reach about 80 % of H100 performance and its supply‑chain origins remain opaque, its strategic significance is substantial, marking a milestone in China’s AI‑chip autonomy and hinting at a “differentiated‑competition” path.
Future coverage will track the chip’s real‑world deployments, its impact on China’s AI ecosystem, and evolving strategies to build resilient, domestically‑controlled AI infrastructure.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
