DeepSeek V4 Unveiled: 1.6 T Parameters, Million‑Token Context, Fully Open‑Source
DeepSeek V4 introduces two open‑source MoE models—Pro and Flash—with up to 1.6 T parameters, 1 M token context, a new DSA sparse‑attention mechanism, extensive benchmark results, and a tiered pricing scheme, while remaining compatible with OpenAI and Anthropic APIs.
DeepSeek announced the preview of its V4 series, delivering the model, API, and weights together; the weights are released on Hugging Face and ModelScope, and the official app and website have been updated to the new version.
The V4 lineup includes two variants:
deepseek‑v4‑pro : 1.6 T total parameters, 49 B activation parameters, trained on 33 T tokens, 1 M context length, open‑source, API available, operates in "expert mode".
deepseek‑v4‑flash : 284 B total parameters, 13 B activation parameters, trained on 32 T tokens, 1 M context length, open‑source, API available, operates in "fast mode".
Both models use a Mixture‑of‑Experts (MoE) architecture; switching between them in the API only requires changing the model_name parameter to deepseek‑v4‑pro or deepseek‑v4‑flash without other code changes.
The key technical advance is DeepSeek Sparse Attention (DSA), which reduces attention computation in long contexts. In the official technical report, the Pro model achieved MRCR‑1M = 83.5 and CorpusQA‑1M = 62.0, scores that rank among the top open‑source results but still trail Claude Opus 4.6 (92.9/71.7).
Benchmark highlights include:
Knowledge & reasoning: Chinese‑SimpleQA = 84.4 (best among evaluated models), IMOAnswerBench = 89.8, Apex Shortlist = 90.2; however, SimpleQA Verified = 57.9 and HLE = 37.7 lag behind leading closed‑source models.
Programming: LiveCodeBench = 93.5, Codeforces Rating = 3206, slightly surpassing GPT‑5.4 (3168).
Agent abilities: SWE Verified = 80.6, SWE Pro = 55.4, SWE Multilingual = 76.2 (close to Opus 4.6 and GPT‑5.4), but Terminal Bench = 67.9 and Toolathlon = 51.8 fall short of GPT‑5.4.
Reasoning mode is enabled by default; users can set reasoning_effort to high or max (OpenAI format) or adjust output_config.effort (Anthropic format). The response content is returned in reasoning_content, while normal answers use content. For multi‑turn tool calls, the previous reasoning_content must be sent back, otherwise the API returns a 400 error.
Pricing (per 1 M tokens) is:
deepseek‑v4‑pro: input (cache hit) = ¥1, input (miss) = ¥12, output = ¥24.
deepseek‑v4‑flash: input (hit) = ¥0.2, input (miss) = ¥1, output = ¥2.
The Pro tier is currently limited by high‑end compute, so its price is expected to drop after wider availability of Ascend 950 supernodes later this year; Flash pricing is already competitive with mainstream open‑source APIs.
Both models retain compatibility with OpenAI ChatCompletions and Anthropic protocols; only the model parameter needs to be changed to the new model name, and the base URL stays the same. For complex agent scenarios, the provider recommends setting the reasoning intensity to max to improve multi‑step task success.
In summary, DeepSeek V4 delivers open‑source, million‑token context, and MoE‑based sparse attention that keep inference costs manageable; it matches or exceeds leading closed‑source models on Chinese QA and programming benchmarks, while still lagging in English world knowledge and stable multi‑step agent execution. The release signals that open‑source LLMs are becoming viable contenders for mainstream use cases in 2026.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
