Why Huawei’s Ascend 950 PR and DT Have Different Names – The Technical Rationale

Huawei’s Ascend 950 series splits a single die into two variants—PR (Prefill & Recommendation) optimized for compute‑intensive inference with low cost, and DT (Decode & Training) tuned for memory‑bandwidth‑heavy generation and training—illustrating a scenario‑driven, P/D‑separated architecture that maximizes efficiency.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Why Huawei’s Ascend 950 PR and DT Have Different Names – The Technical Rationale

Huawei’s Ascend 950 family adopts a "one‑chip, dual‑architecture" strategy, offering two variants distinguished by the suffixes PR and DT.

What are PR and DT?

PR stands for Prefill & Recommendation . In this mode the chip processes an entire prompt in parallel, builds a KV cache, and quickly produces the first token. It is compute‑intensive, favors low latency and high throughput, and suits workloads such as e‑commerce recommendation.

Decode and Training (DT)

DT stands for Decode & Training . This mode generates tokens sequentially and supports large‑scale model training. It is memory‑bandwidth‑heavy, requiring large capacity and high bandwidth to handle massive parameter reads and writes.

Why two variants?

Large‑model inference consists of two fundamentally different phases. Using a single chip for both would be like making a sprinter run a marathon—neither phase would be optimal. By separating the functions, each variant can be tuned for its dominant resource: compute for PR and bandwidth/capacity for DT.

Technical specifications

Both variants share the same 950 core die.

PR uses Huawei‑designed HiBL 1.0 HBM, offering 128 GB memory with 1.6 TB/s bandwidth.

DT uses Huawei‑designed HiZQ 2.0 HBM, offering 144 GB memory with up to 4 TB/s bandwidth.

PR is positioned for fast first‑token response, strong concurrency and cost‑effectiveness.

DT is positioned for stable long‑text generation and high‑speed training without bandwidth bottlenecks.

Naming logic

The suffixes are purposeful: PR = Prefill + Recommendation (compute‑first, cost‑focused) and DT = Decode + Training (bandwidth‑first, performance‑focused). This reflects a P/D (Prefill/Decode) separation that assigns dedicated silicon to each stage, avoiding resource contention and achieving optimal power‑efficiency, latency and cost.

Broader implications

The design demonstrates architectural maturity: rather than merely increasing parameters or raw FLOPS, Huawei partitions the workload by scenario and load. The 950 family can handle both inference and training while keeping costs low. For users it means paying only for the performance they need; for the industry it signals a pragmatic, scenario‑driven AI‑chip roadmap.

AI chipHuaweiHBMDecodePrefillscenario optimizationAscend 950
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.