Artificial Intelligence 8 min read

DeepSeek V4 Released: Open-Source LLM Challenges Closed-Source Leaders and Partners with Huawei Chips

DeepSeek V4 launches in two versions—Pro and Flash—offering 1 M token context, enhanced agent capabilities, world‑knowledge and reasoning performance, a new token‑compression attention mechanism with DSA sparse attention, Huawei compute support, updated APIs, and a migration plan for legacy models.

IT Services Circle

Apr 24, 2026

DeepSeek V4 Released: Open-Source LLM Challenges Closed-Source Leaders and Partners with Huawei Chips

Model Variants

DeepSeek‑V4‑Pro : 1.6 T parameters, 49 B activation, 1 M token context.

DeepSeek‑V4‑Flash : 284 B parameters, 13 B activation, 1 M token context, optimized for lower inference cost.

Performance Claims

In internal Agentic Coding evaluations, V4‑Pro achieves the best level among current open‑source models, outperforms Sonnet 4.5, and delivers quality close to Opus 4.6 non‑thinking mode, though it remains behind Opus 4.6 thinking mode.

Agent ability: excels in Agentic coding benchmarks and other agent‑related tests.

World knowledge: far ahead of other open‑source models, slightly behind the top closed‑source Gemini‑Pro‑3.1.

Reasoning performance: surpasses all publicly evaluated open‑source models in mathematics, STEM, and competitive coding tasks, matching top closed‑source results.

Flash Variant Details

Inference speed comparable to Pro on simple tasks; gap appears on high‑difficulty tasks.

World‑knowledge level slightly lower than Pro but still strong; lower API cost due to smaller parameter/activation footprint.

Failure case: in the classic "desperate father" biology scenario, the model did not infer red‑green color blindness, illustrating a remaining limitation.

Million‑Token Context as Standard

From the release date, a 1 M token context is the default for all DeepSeek services. One year earlier only Gemini offered 1 M context; other closed‑source models were limited to 128 K–200 K, and open‑source models rarely reached this scale.

New Attention Mechanism

V4 introduces a novel attention design that compresses the token dimension and combines it with Dynamic Sparse Attention (DSA). This dramatically reduces computational and memory requirements compared with traditional dense attention.

DSA was first introduced in the V3.2‑Exp update six months earlier, where benchmark scores were comparable to V3.1‑Terminus. V4 builds on that foundation.

Agent Capability Optimizations

V4 has been adapted and optimized for major agent frameworks such as Claude Code, OpenClaw, OpenCode, and CodeBuddy, yielding improvements in code generation and document creation tasks.

The release includes a sample PPT slide generated by V4‑Pro within an agent framework.

API Usage

Both V4‑Pro and V4‑Flash are available via the OpenAI ChatCompletions interface and the Anthropic interface.

Invoke the models by setting the model parameter to deepseek-v4-pro or deepseek-v4-flash (base URL unchanged).

Both versions support a 1 M token context and offer non‑thinking and thinking modes.

In thinking mode, the reasoning_effort parameter can be set to high or max for stronger reasoning.

Model Deprecation Timeline

The legacy model names deepseek-chat and deepseek-reasoner will be retired on 2026‑07‑24. Until then they map to V4‑Flash’s non‑thinking and thinking modes respectively. Individual developers need only change the model parameter; production users must migrate before the deadline.

Resources

Model repository (Hugging Face): https://huggingface.co/collections/deepseek-ai/deepseek-v4

ModelScope collection: https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4

Technical report (PDF): https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

API integration open-source LLM agent capabilities DeepSeek V4 1M context DSA sparse attention Huawei compute

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.