DeepSeek V4 Unveiled: How Its Million-Token Context Redefines Open-Source LLMs

DeepSeek released the V4 preview, introducing V4‑Pro (1.6 T parameters, 49 B activation neurons, 33 T tokens) and V4‑Flash (284 B parameters, 13 B activation neurons, 32 T tokens) with 1 M token context, a novel DSA sparse attention that reduces compute and memory, and performance that rivals top closed‑source models in agentic coding, world‑knowledge and reasoning benchmarks, while offering an API compatible with OpenAI and Anthropic.

AI Engineering
AI Engineering
AI Engineering
DeepSeek V4 Unveiled: How Its Million-Token Context Redefines Open-Source LLMs

DeepSeek officially launched the preview of its V4 series, offering two variants: V4‑Pro with 1.6 trillion parameters, 49 billion activation‑layer neurons and 33 trillion tokens of pre‑training data; and V4‑Flash with 284 billion parameters, 13 billion activation‑layer neurons and 32 trillion tokens. Both models support a 1 million‑token context window and are available now via the official website and app.

Performance: Matching Top Closed‑Source Models

In agentic coding tests, V4‑Pro achieves the best results among open‑source models and is reported by internal users to outperform Anthropic’s Sonnet 4.5, approaching the quality of Opus 4.6 in non‑reasoning mode. In world‑knowledge benchmarks, V4‑Pro leads other open‑source models and trails only Gemini‑Pro‑3.1. On mathematics, STEM, and competition‑style coding tasks, V4‑Pro surpasses all publicly evaluated open models, matching the performance of leading proprietary systems.

模型性能对比图
模型性能对比图

Technical Breakthrough: Innovative Attention Mechanism

DeepSeek‑V4 introduces a new attention mechanism that compresses tokens along the token dimension and incorporates DeepSeek Sparse Attention (DSA). This design delivers world‑leading long‑context capability while substantially lowering computational and memory requirements compared with traditional approaches.

计算效率对比图
计算效率对比图

Economic Choice: V4‑Flash

For users seeking faster, more cost‑effective service, V4‑Flash offers a viable alternative. Although its world‑knowledge reservoir is slightly weaker than V4‑Pro, its reasoning ability is comparable, and it performs similarly on simple agent tasks.

推理能力对比表
推理能力对比表

API Service and Open‑Source Release

The DeepSeek API now supports both V4‑Pro and V4‑Flash, compatible with OpenAI ChatCompletions and Anthropic interfaces. The base URL remains unchanged; the model parameter should be set to deepseek-v4-pro or deepseek-v4-flash. Both variants support non‑reasoning and reasoning modes, with the reasoning_effort parameter controlling reasoning intensity (high/max). For complex agent scenarios, the reasoning mode with maximum intensity is recommended.

API调用示例
API调用示例

The model weights have been open‑sourced on Hugging Face and ModelScope, and the technical report is publicly available. The legacy API model names deepseek-chat and deepseek-reasoner will be deprecated on 2026‑07‑24, after which they will point to V4‑Flash’s non‑reasoning and reasoning modes respectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DeepSeeklarge language modelSparse AttentionOpenAI API Compatibilitymillion-token context
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.