DeepSeek-V4 Unveiled: 1M Context Length and Ascend Compute Power
DeepSeek has launched the open‑source DeepSeek‑V4 series, offering Pro and Flash models with a 1 million token context window, a novel sparse attention mechanism, performance that rivals Opus 4.6 on coding and knowledge benchmarks, tiered pricing, and future cost reductions once Ascend 950 supernodes become widely available.
1. Context Length
Both DeepSeek‑V4‑Pro and DeepSeek‑V4‑Flash support a 1 million‑token context window and a maximum output of 384 K tokens. The models use a new attention mechanism that compresses the token dimension and incorporates DeepSeek Sparse Attention (DSA), dramatically lowering compute and memory requirements. A 1 M context is the default for all official DeepSeek services.
2. Performance
DeepSeek‑V4‑Pro
Agentic Coding evaluation achieves the best open‑source score; other Agent benchmarks are also strong.
Internal use reports better results than Sonnet 4.5 and delivery quality close to Opus 4.6 non‑thinking mode, with a gap to Opus 4.6 thinking mode.
World‑knowledge evaluation is far ahead of other open‑source models, slightly behind Gemini‑Pro‑3.1.
Math, STEM, and competitive coding tests surpass all publicly evaluated open‑source models and are on par with top closed‑source models.
Optimized for major Agent products such as Claude Code, OpenClaw, OpenCode, and CodeBuddy.
DeepSeek‑V4‑Flash
World‑knowledge reserve is slightly lower than Pro, with comparable inference capability.
Smaller model parameters and activations provide faster, more economical API service.
Performs on par with Pro for simple Agent tasks; high‑difficulty tasks still lag behind.
3. Pricing
Input (cache hit): ¥0.2 per M tokens for V4‑Flash, ¥1 per M tokens for V4‑Pro.
Input (cache miss): ¥1 per M tokens for V4‑Flash, ¥12 per M tokens for V4‑Pro.
Output: ¥2 per M tokens for V4‑Flash, ¥24 per M tokens for V4‑Pro.
DeepSeek notes that V4‑Pro’s service throughput is currently limited by high‑end compute. A substantial price reduction is expected after the batch release of Ascend 950 supernodes in the second half of the year.
4. API Details
Base URLs: https://api.deepseek.com (OpenAI format) and https://api.deepseek.com/anthropic (Anthropic format).
Model identifiers: deepseek-v4-pro or deepseek-v4-flash.
Both models support non‑thinking and thinking modes; the thinking mode uses the reasonion_effort parameter (high / max).
The endpoints deepseek-chat and deepseek-reasoner will be retired on 2026‑07‑24; they currently map to V4‑Flash’s non‑thinking and thinking modes respectively.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
