Artificial Intelligence 6 min read

DeepSeek V4 Unleashed: 1M‑Token Context Becomes Commodity, Teams with Ascend to Challenge Compute Dominance

DeepSeek released two V4 models—Pro and Flash—both supporting 1‑million‑token context as a standard feature, showcasing top‑tier agentic coding, world‑knowledge, and inference performance, while introducing DSA sparse attention and announcing upcoming large‑scale deployment on Huawei Ascend hardware.

ITPUB

Apr 24, 2026

DeepSeek V4 Unleashed: 1M‑Token Context Becomes Commodity, Teams with Ascend to Challenge Compute Dominance

DeepSeek released two new models, V4‑Pro (1.6 T total parameters, 49 B active) and V4‑Flash (284 B total parameters, 13 B active), both supporting a 1‑million‑token context length, turning what was once a premium feature into a standard offering.

Performance benchmarks show the models excel in agentic coding (scoring 4.5, approaching Opus 4.6), world knowledge (ahead of other open‑source models, just behind Gemini‑Pro‑3.1), and inference on math/STEM tasks, matching top closed‑source systems.

The core innovation is a new attention mechanism that compresses tokens and incorporates DSA (DeepSeek Sparse Attention), dramatically reducing compute and memory requirements. The earlier V3.2‑Exp update laid the groundwork for this sparse attention.

DeepSeek has validated fine‑grained expert parallelism on Huawei Ascend NPU, enabling cross‑hardware inference. While current releases run on the CUDA toolchain, the upcoming batch of Ascend 950 super‑nodes (expected in the second half of 2024) will lower V4‑Pro pricing and expand throughput.

API changes: both V4‑Pro and V4‑Flash are available via OpenAI‑compatible ChatCompletions and Anthropic endpoints; model names are deepseek‑v4‑pro and deepseek‑v4‑flash. Both versions support non‑thinking and thinking modes, selectable via the reasoning_effort parameter, with “max” recommended for complex agent scenarios.

Legacy models deepseek‑chat and deepseek‑reasoner will be retired on 2026‑07‑24, redirecting to V4‑Flash’s non‑thinking and thinking modes during a three‑month migration window.

The author notes that after months of speculation, DeepSeek delivered without a roadmap or livestream, demonstrating a “rate‑the‑path” approach that turns hype into tangible capability.

DeepSeek Large Language Model AI inference V4 Huawei Ascend 1M context DSA sparse attention

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.