Artificial Intelligence 5 min read

DeepSeek-V4 Unveiled: 1M Context Length and Ascend Compute Power

DeepSeek has launched the open‑source DeepSeek‑V4 series, offering Pro and Flash models with a 1 million token context window, a novel sparse attention mechanism, performance that rivals Opus 4.6 on coding and knowledge benchmarks, tiered pricing, and future cost reductions once Ascend 950 supernodes become widely available.

Tech Musings

Apr 24, 2026

DeepSeek-V4 Unveiled: 1M Context Length and Ascend Compute Power

1. Context Length

Both DeepSeek‑V4‑Pro and DeepSeek‑V4‑Flash support a 1 million‑token context window and a maximum output of 384 K tokens. The models use a new attention mechanism that compresses the token dimension and incorporates DeepSeek Sparse Attention (DSA), dramatically lowering compute and memory requirements. A 1 M context is the default for all official DeepSeek services.

2. Performance

DeepSeek‑V4‑Pro

Agentic Coding evaluation achieves the best open‑source score; other Agent benchmarks are also strong.

Internal use reports better results than Sonnet 4.5 and delivery quality close to Opus 4.6 non‑thinking mode, with a gap to Opus 4.6 thinking mode.

World‑knowledge evaluation is far ahead of other open‑source models, slightly behind Gemini‑Pro‑3.1.

Math, STEM, and competitive coding tests surpass all publicly evaluated open‑source models and are on par with top closed‑source models.

Optimized for major Agent products such as Claude Code, OpenClaw, OpenCode, and CodeBuddy.

DeepSeek‑V4‑Flash

World‑knowledge reserve is slightly lower than Pro, with comparable inference capability.

Smaller model parameters and activations provide faster, more economical API service.

Performs on par with Pro for simple Agent tasks; high‑difficulty tasks still lag behind.

3. Pricing

Input (cache hit): ¥0.2 per M tokens for V4‑Flash, ¥1 per M tokens for V4‑Pro.

Input (cache miss): ¥1 per M tokens for V4‑Flash, ¥12 per M tokens for V4‑Pro.

Output: ¥2 per M tokens for V4‑Flash, ¥24 per M tokens for V4‑Pro.

DeepSeek notes that V4‑Pro’s service throughput is currently limited by high‑end compute. A substantial price reduction is expected after the batch release of Ascend 950 supernodes in the second half of the year.

4. API Details

Base URLs: https://api.deepseek.com (OpenAI format) and https://api.deepseek.com/anthropic (Anthropic format).

Model identifiers: deepseek-v4-pro or deepseek-v4-flash.

Both models support non‑thinking and thinking modes; the thinking mode uses the reasonion_effort parameter (high / max).

The endpoints deepseek-chat and deepseek-reasoner will be retired on 2026‑07‑24; they currently map to V4‑Flash’s non‑thinking and thinking modes respectively.

Large Language Model pricing Sparse Attention AI benchmarking DeepSeek V4 1M context

Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.