DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6
DeepSeek‑V4’s open‑source report reveals a hybrid CSA/HCA attention design, manifold‑constrained residuals and the Muon optimizer that cut per‑token FLOPs to 27 % and KV‑Cache to 10 % at 1 M tokens, while benchmark results show it outperforms Claude Opus 4.6 on most tasks yet still lags on complex instruction following and multi‑turn dialogue.
1. Million‑token context efficiency breakthrough
DeepSeek‑V4 (Pro/Flash) targets the efficiency bottleneck of ultra‑long context. By combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), applying manifold‑constrained hyper‑connection (mHC) and the Muon optimizer, V4‑Pro (1.6 T parameters, 49 B activations) and V4‑Flash (284 B parameters, 13 B activations) achieve only 27 % of the per‑token FLOPs and 10 % of the KV‑Cache cost of V3.2 at a 1 M‑token context.
2. Direct jab at Claude
In the “White‑Collar Task” evaluation, DeepSeek‑V4‑Pro‑Max was pitted against Claude Opus 4.6‑Max. The report quotes a human‑evaluation comment: “It also excels in long‑form generation, delivering in‑depth, coherent narratives rather than relying on the overly simplistic bullet points frequently produced by Opus‑4.6‑Max.”
"It also excels in long‑form generation, delivering in‑depth, coherent narratives rather than relying on the overly simplistic bullet points frequently produced by Opus‑4.6‑Max ."
Figure 11 shows win‑rate percentages: analysis 55.0 % vs 37.0 %, generation 52.0 % vs 38.0 %, editing 47.0 % vs 35.0 %, overall 53.0 % vs 37.0 %.
Figure 12 provides detailed dimension scores, where DeepSeek leads in Task Completion, Content Quality and Formatting Aesthetics, but Claude slightly edges it in Instruction Following.
3. Benchmark standing beyond the jab
On public benchmarks, V4‑Pro‑Max ranks among the top open‑source models: SimpleQA‑Verified 57.9 (highest open‑source), Codeforces Rating 3206 (top 23 humans), Apex Shortlist 90.2 (surpassing GPT‑5.4 78.1 and Gemini‑3.1‑Pro 89.1), HMMT 2026 Feb 95.2, IMOAnswerBench 89.8.
The report notes that on knowledge‑heavy benchmarks (MMLU‑Pro, GPQA, HLE) V4‑Pro‑Max still trails Gemini‑3.1‑Pro, and on agent tasks it remains behind Claude Opus 4.6 and GPT‑5.4, but it is the first open‑source model to match frontier closed‑source performance in reasoning and code‑competition tasks.
4. Chinese‑language dominance
In Chinese writing evaluations, DeepSeek‑V4‑Pro outperforms Gemini‑3.1‑Pro with an overall win‑rate of 62.7 % vs 34.1 %, excelling in technical text (75.86 %), email (73.29 %) and personal reflections (75.56 %).
Agentic Search also shows a qualitative leap over traditional RAG, as illustrated by the accompanying figures.
5. Acknowledged gaps
Table 14 reveals that on complex instruction following and multi‑turn writing, Claude‑Opus‑4.5 still leads: 46.9 % vs 53.1 % and 45.6 % vs 51.7 % respectively, giving Claude an overall advantage of 52.0 % vs 45.9 %.
6. Technical foundations behind the confidence
6.1 Hybrid attention: CSA + HCA
CSA compresses every m tokens into one KV entry and performs sparse Top‑k selection via a Lightning Indexer. HCA applies a much higher compression ratio m′ while retaining dense attention. Together they reduce KV‑Cache to roughly 2 % of traditional GQA at 1 M tokens.
6.2 Muon optimizer
The Muon optimizer is introduced for trillion‑parameter MoE training. It combines a hybrid Newton‑Schulz iteration (first 8 steps fast convergence, last 2 steps fine‑tuning) to achieve orthogonalization, and uses FP4 quantization‑aware training to reduce memory consumption.
6.3 Post‑training: OPD unified expert
Two‑stage post‑training first trains specialist experts for math, code, agent and instruction domains (SFT + GRPO), then merges them into a single model via On‑Policy Distillation (OPD) using full‑vocabulary reverse KL distillation.
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdfHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
