Artificial Intelligence 17 min read

DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility

DeepSeek V4 quietly debuted with a 1.6‑trillion‑parameter MoE model, introducing CSA+HCA compressed attention, mHC manifold‑constrained hyperconnections, and the Muon optimizer, achieving 1M‑token context at a quarter of V3’s cost, top Codeforces and LiveCodeBench scores, a 1/7 Opus price, MIT open‑source licensing, and dual‑stack Ascend NPU/NVIDIA GPU support.

ArcThink

Apr 25, 2026

DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility

Release and core question

On 24 April DeepSeek released V4, a 1.6 trillion‑parameter mixture‑of‑experts model with a 1 million‑token context window, under an MIT license.

Metrics beyond parameter count

Performance is evaluated on three axes: capability ceiling (hardest tasks), inference efficiency (compute per token), and accessibility (affordability for users and enterprises). V4 aims to raise the capability ceiling while keeping inference cost sub‑linear.

Architectural innovations

Compressed Sequence Attention (CSA) and Heavily Compressed Attention (HCA)

Traditional attention scales O(L²). CSA compresses token sequences to remove redundancy; HCA further compresses for long‑range dependencies. At a 1 M context, V4‑Pro’s per‑token FLOPs are 27 % of V3.2 and KV‑Cache occupies 10 % of memory, reducing inference cost to roughly one‑quarter of V3 on the same hardware.

Manifold‑constrained Hyperconnection (mHC)

Scaling MoE from 671 B to 1.6 T parameters caused gradient explosion and routing collapse. mHC imposes a geometric constraint that forces signals to travel on a structured manifold, suppressing gradient dispersion and enabling stable training of >1 T‑parameter MoE models.

Muon optimizer

Replaces AdamW with an optimizer that applies matrix orthogonalization in its momentum update. Under equal compute, Muon converges faster and reaches lower final loss, demonstrated on >32 T tokens of pre‑training data.

Benchmark results

Public evaluations show:

Codeforces rating : 3206 (highest among compared models).

LiveCodeBench : 93.5 %.

SWE Verified : 80.6 % (self‑reported, not directly comparable to Claude Opus 4.6 87.6 %).

Terminal Bench : 67.9 %.

On the FundaAI 38‑task suite:

Weighted average: Claude Opus 4.6 (think) 8.72 > V4‑Pro 8.27 > V4‑Flash 8.01.

Financial research: V4‑Pro ties with Opus 4.7 (7 : 7).

Game‑theory (NVDA task): V4‑Pro scores 10/10.

Cost per task: V4‑Flash $0.007, far cheaper than Claude Opus.

In knowledge and reasoning benchmarks V4‑Pro trails Opus 4.6 and Gemini 3.1 Pro on MMLU‑Pro (87.5 % vs 89.1 %/91.0 %) and GPQA Diamond (90.1 % vs 91.3 %/94.3 %). It matches or exceeds them on IMOAnswerBench (89.8 vs 75.3/91.4) and SimpleQA‑Verified (57.9 vs 46.2/75.6).

Hardware validation

V4 is the first frontier model verified on both Huawei Ascend NPU and NVIDIA GPU. Reported numbers:

Ascend 950: 20 ms latency for V4‑Pro, 10 ms for V4‑Flash.

Ascend A3 super‑node provides a reference training implementation.

Cambricon completed Day‑0 vLLM support.

Pricing and licensing

MIT‑licensed on HuggingFace. Output cost per million tokens:

V4‑Flash: $0.28 (baseline 1×).

V4‑Pro: $3.48 (≈ 12× baseline, ≈ 1/4 of GPT‑5.4 $15 and ≈ 1/7 of Claude Opus 4.6 $25).

Night‑time (23:00‑07:00 CST) cache‑hit price: ¥0.2 per million tokens.

Limitations

Thought‑mode performance lags behind Claude Opus 4.6, with occasional time‑outs on complex reasoning.

Inference throughput for V4‑Pro currently depends on NVIDIA GPUs; Ascend production capacity is a short‑term bottleneck.

Long‑term sustainability of ultra‑low pricing depends on scaling and ecosystem value.

References

[1]

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro [2] https://www.qbitai.com/2026/04/406359.html [3] https://fundaai.substack.com/p/deepdeepseek-v4-vs-claude-vs-gpt [4] https://simonwillison.net/2026/apr/24/deepseek-v4/ [5] https://api-docs.deepseek.com/news/news260424 [6] https://www.cls.cn/detail/2354690 [7] https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Mixture of Experts large language model benchmark open-source AI Muon optimizer DeepSeek V4 compressed attention Manifold-constrained Hyperconnection

Written by

ArcThink

ArcThink makes complex information clearer and turns scattered ideas into valuable insights and understanding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.