DeepSeek V4 Released: Open-Source LLM Challenges Closed-Source Leaders and Partners with Huawei Chips
DeepSeek V4 launches in two versions—Pro and Flash—offering 1 M token context, enhanced agent capabilities, world‑knowledge and reasoning performance, a new token‑compression attention mechanism with DSA sparse attention, Huawei compute support, updated APIs, and a migration plan for legacy models.
Model Variants
DeepSeek‑V4‑Pro : 1.6 T parameters, 49 B activation, 1 M token context.
DeepSeek‑V4‑Flash : 284 B parameters, 13 B activation, 1 M token context, optimized for lower inference cost.
Performance Claims
In internal Agentic Coding evaluations, V4‑Pro achieves the best level among current open‑source models, outperforms Sonnet 4.5, and delivers quality close to Opus 4.6 non‑thinking mode, though it remains behind Opus 4.6 thinking mode.
Agent ability: excels in Agentic coding benchmarks and other agent‑related tests.
World knowledge: far ahead of other open‑source models, slightly behind the top closed‑source Gemini‑Pro‑3.1.
Reasoning performance: surpasses all publicly evaluated open‑source models in mathematics, STEM, and competitive coding tasks, matching top closed‑source results.
Flash Variant Details
Inference speed comparable to Pro on simple tasks; gap appears on high‑difficulty tasks.
World‑knowledge level slightly lower than Pro but still strong; lower API cost due to smaller parameter/activation footprint.
Failure case: in the classic "desperate father" biology scenario, the model did not infer red‑green color blindness, illustrating a remaining limitation.
Million‑Token Context as Standard
From the release date, a 1 M token context is the default for all DeepSeek services. One year earlier only Gemini offered 1 M context; other closed‑source models were limited to 128 K–200 K, and open‑source models rarely reached this scale.
New Attention Mechanism
V4 introduces a novel attention design that compresses the token dimension and combines it with Dynamic Sparse Attention (DSA). This dramatically reduces computational and memory requirements compared with traditional dense attention.
DSA was first introduced in the V3.2‑Exp update six months earlier, where benchmark scores were comparable to V3.1‑Terminus. V4 builds on that foundation.
Agent Capability Optimizations
V4 has been adapted and optimized for major agent frameworks such as Claude Code, OpenClaw, OpenCode, and CodeBuddy, yielding improvements in code generation and document creation tasks.
The release includes a sample PPT slide generated by V4‑Pro within an agent framework.
API Usage
Both V4‑Pro and V4‑Flash are available via the OpenAI ChatCompletions interface and the Anthropic interface.
Invoke the models by setting the model parameter to deepseek-v4-pro or deepseek-v4-flash (base URL unchanged).
Both versions support a 1 M token context and offer non‑thinking and thinking modes.
In thinking mode, the reasoning_effort parameter can be set to high or max for stronger reasoning.
Model Deprecation Timeline
The legacy model names deepseek-chat and deepseek-reasoner will be retired on 2026‑07‑24. Until then they map to V4‑Flash’s non‑thinking and thinking modes respectively. Individual developers need only change the model parameter; production users must migrate before the deadline.
Resources
Model repository (Hugging Face): https://huggingface.co/collections/deepseek-ai/deepseek-v4
ModelScope collection: https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4
Technical report (PDF): https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
