Artificial Intelligence 27 min read

DeepSeek V4 Preview: A Sovereign Shift Beyond Benchmarks

Developers can sift through official silence and industry leaks—internal statements, Ascend 950PR supply‑chain hints, and sparse‑attention innovations—to assess DeepSeek V4’s likely technical leaps, from million‑token context to native Ascend training, and its strategic impact on the open‑source AI landscape and CUDA independence.

ArcThink

Apr 11, 2026

DeepSeek V4 Preview: A Sovereign Shift Beyond Benchmarks

On April 11, 2026, the DeepSeek model page on HuggingFace still lists DeepSeek-V3.2-Speciale with a release date of December 1, 2025, and the company’s X account shows only an "OCR‑2" tweet. Yet multiple tech media outlets report that DeepSeek V4 will be released in late April and will run on Huawei Ascend 950PR.

Signal Analysis – What’s Real?

Official "Zero Entry" as a Signal

Most frontier‑model companies pre‑announce next‑gen releases to hype the market (e.g., OpenAI’s GPT‑5, Anthropic’s Claude Opus). DeepSeek does the opposite: before V3.2 it posted only a single "coming soon" note, and the V3.2 news page jumped straight to a technical article. This silence itself signals that DeepSeek prefers to let the model speak for itself.

Most Credible Signal: Liang Wenfeng’s Internal Statement

IT Home cited an internal memo on April 10, 2026 stating that founder Liang Wenfeng told employees V4 will be "officially released in late April". The same information was independently republished by TechNode in both Chinese and English, giving the claim higher weight because a founder lying to staff carries a high cost.

Second‑hand but Consistent: Supply‑Chain Clues

TrendForce : Article titled "Decoding DeepSeek V4: How Huawei's Ascend 950PR Is Powering China's Push to Break CUDA Dependence".

Digitimes : Reports that Alibaba, ByteDance and Tencent have placed bulk orders for Ascend 950PR.

Reuters/The Information : Cites anonymous sources that V4 will launch in the coming weeks with two variants in development.

All three sources infer the training‑completion window from the Ascend 950PR production schedule, a far more reliable signal than speculative blog posts.

Why the March Release Window Dropped

TechNode reported a "V4 this week" rumor on March 2, and Dataconomy pushed the window to April on March 16. Nothing happened in late March, which suggests DeepSeek was re‑engineering core components with Huawei and Cambricon engineers—a strategic rewrite rather than a simple scale‑up.

Unreliable Numbers

Claims such as "1 T MoE / 37 B activation / 1 M context / $0.30 per million" all originate from aggregator blogs (NxCode, Introl, Atlas Cloud) and lack any official paper, repository, or blog post from DeepSeek. Developers should treat these as interesting rumors pending verification.

Edge Signal: Expert / Fast Mode Switch

On April 8, DeepSeek Chat added a toggle between "Expert Mode" and "Fast Mode". This breaks the unified "hybrid thinking" design of V3.1 and hints that V4 may expose separate model paths or a new budgeting mechanism for reasoning.

Technical Trajectory – From MLA to DSA, Where V4 Might Leap

The V3 technical report (arXiv:2412.19437) outlines five key innovations:

MLA (Multi‑head Latent Attention) : Compresses KV cache into a latent space, reducing cache size by an order of magnitude.

DeepSeekMoE : 257 experts per layer (1 shared, 256 routed, 8 active per token), achieving ~5.5% parameter usage at inference.

Bias‑only Load Balancing : Removes auxiliary loss, preserving gradient purity.

MTP (Multi‑Token Prediction) : Predicts multiple future tokens, boosting signal density and enabling speculative decoding.

FP8 Mixed‑Precision Training : Detailed GEMM quantisation scheme that is reproducible at scale.

Training cost for V3 was 2.788 M H800 GPU‑hours (~$5.57 M) on 14.8 T tokens, a figure that challenged the notion that only multi‑hundred‑million‑dollar budgets can produce frontier models.

R1’s Contribution: GRPO

R1 (arXiv:2501.12948) introduced Group Relative Policy Optimization, removing the critic from PPO and using a shared prompt‑level baseline, saving compute and memory. The paper was later accepted by Nature.

V3.1 – Hybrid Thinking Integration

V3.1 merged V3’s fast chat path with R1’s reasoning path, exposing deepseek-chat (non‑thinking) and deepseek-reasoner (thinking) APIs. This proved that DeepSeek can toggle model depth without extra inference cost, a prerequisite for the upcoming DSA.

V3.2‑Exp / V3.2 – DSA Arrival

DSA (DeepSeek Sparse Attention) debuted in V3.2‑Exp (Sept 29, 2025) and stabilized in the official V3.2 release (Dec 1, 2025, arXiv:2512.02556). It adds two components on top of MLA:

Lightning Indexer : Quickly scores historical tokens to select the most relevant ones, cheaper than full attention.

Fine‑grained Top‑k Selection : Performs full attention on only the top‑2048 KV tokens per query.

This reduces attention complexity from O(L²) to O(L·k) with k=2048, explaining the >50% price cut for V3.2‑Exp APIs.

V3.2‑Exp continued training on the V3.1‑Terminus checkpoint for 15 k steps at a learning rate of 7.3×10⁻⁶, consuming 943.7 B tokens—essentially a small‑scale re‑training of the base model.

V3.2‑Speciale, a high‑compute variant, achieved 89.6% on LiveCodeBench, placing it in the top‑3 of open‑source math/code reasoning models.

Potential V4 Crossings

Million‑token context : Extending DSA’s top‑k from 2048 to support 1 M tokens may require redesigning the Lightning Indexer or making k scale with √L.

Joint MLA + DSA + MTP optimisation : Currently these modules are independent; a unified latent‑attention + sparse‑routing framework could deliver a larger leap than incremental benchmark gains.

Native Ascend training loop : If V4 is fully trained on Ascend 950PR, it would replace the CUDA backend, requiring new FP8 kernels, MoE all‑to‑all primitives, and flash‑attention tiling—each a publishable systems paper.

Native multimodal support : Rumors of text/image/video generation raise engineering questions about token‑length disparity across modalities and how top‑k is allocated.

Scaled post‑training compute : V3.2 already pushed post‑training scaling; V4 could disclose RL‑cluster scheduling, rollout data pipelines, and reward parallelisation.

Strategic Significance – The "No‑CUDA" Sovereign Moment

Comparing V4 with the 2025 R1 release shows a reversal: R1 proved efficiency on H800 GPUs (still CUDA‑based), while V4 may demonstrate a full training‑to‑inference stack on Huawei Ascend, breaking NVIDIA’s CUDA‑centric AI supply chain.

This matters because U.S. export controls treat CUDA as a strategic lever; a model trained entirely on Ascend would undermine that lever, even if China still lacks large‑scale FP8 production.

Regulatory backlash has already emerged: Texas banned DeepSeek on government devices in Jan 2025, followed by other states and a federal "No DeepSeek on Government Devices Act". Open‑weight releases become a defensive shield against such bans.

Implications for Developers

Should you wait? If V4 launches in late April, a two‑week wait is acceptable; in the meantime V3.2 and V3.2‑Speciale remain stable.

API considerations : Watch for (1) latency of million‑token context (lightning indexer overhead), (2) compatibility of new features (tool calls, streaming, thinking budget) with existing V3.2 schemas, and (3) any price‑structure changes, especially cache‑hit vs. miss differentials.

Self‑deployment opportunities : If weights are open and Ascend‑native, Chinese enterprises with data‑sensitivity concerns could run a flagship model without NVIDIA hardware. However, Ascend 950PR supply will be limited initially, and the ecosystem (vLLM, SGLang, MindIE) may need 2–3 months to mature.

Conclusion

The article stresses that while many V4 specs remain unconfirmed, the most critical unknowns are the training stack, multimodal capabilities, and exact pricing. If the Ascend‑native claim holds, V4 represents the first structural shift in the AI geopolitical landscape, not just a benchmark bump.

DeepSeek open-source AI V4 Sparse Attention industry insight Huawei Ascend AI model analysis

Written by

ArcThink

ArcThink makes complex information clearer and turns scattered ideas into valuable insights and understanding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Signal Analysis – What’s Real?

Official "Zero Entry" as a Signal

Most Credible Signal: Liang Wenfeng’s Internal Statement

Second‑hand but Consistent: Supply‑Chain Clues

Why the March Release Window Dropped

Unreliable Numbers

Edge Signal: Expert / Fast Mode Switch

Technical Trajectory – From MLA to DSA, Where V4 Might Leap

R1’s Contribution: GRPO

V3.1 – Hybrid Thinking Integration

V3.2‑Exp / V3.2 – DSA Arrival

Potential V4 Crossings

Strategic Significance – The "No‑CUDA" Sovereign Moment

Implications for Developers

Conclusion

ArcThink

How this landed with the community

Was this worth your time?

0 Comments

Most Credible Signal: Liang Wenfeng’s Internal Statement