DeepSeek V4 Launch: Open‑Source Model Beats Closed‑Source Leaders in Coding & Math, 1.6 T Params, 1 M Context
DeepSeek V4, released today, offers two open‑source models (Pro and Flash) with up to 1.6 T parameters and a 1‑million‑token context, achieving top‑tier programming and mathematics benchmark scores that surpass the three major closed‑source competitors, while cutting API costs to a fraction of the price.
Model variants
DeepSeek‑V4‑Pro – 1.6 T total parameters, 49 B activated, FP4+FP8 precision, 1 M token context window.
DeepSeek‑V4‑Flash – 284 B total parameters, 13 B activated, same precision and context.
Benchmark performance
Programming ability
LiveCodeBench: V4‑Pro 93.5, Opus 4.6 88.8, Gemini‑3.1‑Pro 91.7.
Codeforces competition score: V4‑Pro 3206, GPT‑5.4 3168, Gemini‑3.1‑Pro 3052.
SWE‑bench verified: V4‑Pro 80.6, Opus 4.6 80.8, Gemini‑3.1‑Pro 80.6.
Mathematical reasoning
HMMT 2026: V4‑Pro 95.2 %, Opus 4.6 96.2 %, GPT‑5.4 97.7 %.
IMO QA: V4‑Pro 89.8 %, Opus 4.6 75.3 %, GPT‑5.4 81.0 %, Gemini‑3.1‑Pro 86.0 %.
Apex Shortlist high‑difficulty: V4‑Pro 90.2 %, Opus 4.6 85.9 %, GPT‑5.4 78.1 %, Gemini‑3.1‑Pro 89.1 %.
Chinese knowledge benchmarks
MMLU‑Pro: V4‑Pro 87.5, Opus 4.6 87.1, GPT‑5.4 87.5, Gemini‑3.1‑Pro 91.0.
Chinese‑SimpleQA: V4‑Pro 84.4, Opus 4.6 76.4, GPT‑5.4 76.8, Gemini‑3.1‑Pro 85.9.
SimpleQA‑Verified: V4‑Pro 57.9, Opus 4.6 46.2, GPT‑5.4 45.3, Gemini‑3.1‑Pro 75.6.
Agent capability
Terminal Bench 2.0: V4‑Pro 67.9, Opus 4.6 75.1, GPT‑5.4 68.5, Gemini‑3.1‑Pro 66.7.
MCPAtlas tool calls: V4‑Pro 73.6, Opus 4.6 73.8, GPT‑5.4 67.2, Gemini‑3.1‑Pro 66.6.
Toolathlon usage: V4‑Pro 51.8, Opus 4.6 47.2, GPT‑5.4 54.6, Gemini‑3.1‑Pro 50.0.
Million‑token context
One million tokens correspond to roughly 15–20 full novels or the entire source code of a medium‑size project. Many models exhibit "context decay" in the latter half of long inputs. V4 addresses this by redesigning the attention mechanism.
Mixed attention architecture
V4 Attention Mechanism:
├─ CSA (Compressed Sparse Attention) – compresses on token dimension
└─ HCA (Heavy Compression Attention) – drastically reduces KV‑cache consumptionEfficiency comparison
Single‑token inference FLOPs: baseline V3.2 100 %, V4‑Pro 27 % (73 % reduction).
KV‑cache consumption: baseline V3.2 100 %, V4‑Pro 10 % (90 % reduction).
Processing one million tokens therefore requires about one‑quarter the compute and one‑tenth the VRAM of V3.2.
Inference modes
Non‑think : fast response for everyday tasks; output is direct.
Think High : logical analysis for complex or planning tasks; returns thought process plus summary.
Think Max : extreme reasoning to explore model limits; requires special prompt and returns detailed reasoning.
API usage example
# Official API
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.deepseek.com"
)
# V4‑Pro
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Hello"}]
)
# V4‑Flash
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello"}]
)Model download links
V4‑Pro – huggingface.co/deepseek-ai/DeepSeek-V4-Pro (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro)
V4‑Flash – huggingface.co/deepseek-ai/DeepSeek-V4-Flash (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash)
V4‑Pro‑Base – huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro-Base)
V4‑Flash‑Base – huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash-Base)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ZhiKe AI
We dissect AI-era technologies, tools, and trends with a hardcore perspective. Focused on large models, agents, MCP, function calling, and hands‑on AI development. No fluff, no hype—only actionable insights, source code, and practical ideas. Get a daily dose of intelligence to simplify tech and make efficiency tangible.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
