DeepSeek V4 Launch: Open‑Source Model Beats Closed‑Source Leaders in Coding & Math, 1.6 T Params, 1 M Context

DeepSeek V4, released today, offers two open‑source models (Pro and Flash) with up to 1.6 T parameters and a 1‑million‑token context, achieving top‑tier programming and mathematics benchmark scores that surpass the three major closed‑source competitors, while cutting API costs to a fraction of the price.

ZhiKe AI
ZhiKe AI
ZhiKe AI
DeepSeek V4 Launch: Open‑Source Model Beats Closed‑Source Leaders in Coding & Math, 1.6 T Params, 1 M Context

Model variants

DeepSeek‑V4‑Pro – 1.6 T total parameters, 49 B activated, FP4+FP8 precision, 1 M token context window.

DeepSeek‑V4‑Flash – 284 B total parameters, 13 B activated, same precision and context.

Benchmark performance

Programming ability

LiveCodeBench: V4‑Pro 93.5, Opus 4.6 88.8, Gemini‑3.1‑Pro 91.7.

Codeforces competition score: V4‑Pro 3206, GPT‑5.4 3168, Gemini‑3.1‑Pro 3052.

SWE‑bench verified: V4‑Pro 80.6, Opus 4.6 80.8, Gemini‑3.1‑Pro 80.6.

Mathematical reasoning

HMMT 2026: V4‑Pro 95.2 %, Opus 4.6 96.2 %, GPT‑5.4 97.7 %.

IMO QA: V4‑Pro 89.8 %, Opus 4.6 75.3 %, GPT‑5.4 81.0 %, Gemini‑3.1‑Pro 86.0 %.

Apex Shortlist high‑difficulty: V4‑Pro 90.2 %, Opus 4.6 85.9 %, GPT‑5.4 78.1 %, Gemini‑3.1‑Pro 89.1 %.

Chinese knowledge benchmarks

MMLU‑Pro: V4‑Pro 87.5, Opus 4.6 87.1, GPT‑5.4 87.5, Gemini‑3.1‑Pro 91.0.

Chinese‑SimpleQA: V4‑Pro 84.4, Opus 4.6 76.4, GPT‑5.4 76.8, Gemini‑3.1‑Pro 85.9.

SimpleQA‑Verified: V4‑Pro 57.9, Opus 4.6 46.2, GPT‑5.4 45.3, Gemini‑3.1‑Pro 75.6.

Agent capability

Terminal Bench 2.0: V4‑Pro 67.9, Opus 4.6 75.1, GPT‑5.4 68.5, Gemini‑3.1‑Pro 66.7.

MCPAtlas tool calls: V4‑Pro 73.6, Opus 4.6 73.8, GPT‑5.4 67.2, Gemini‑3.1‑Pro 66.6.

Toolathlon usage: V4‑Pro 51.8, Opus 4.6 47.2, GPT‑5.4 54.6, Gemini‑3.1‑Pro 50.0.

Million‑token context

One million tokens correspond to roughly 15–20 full novels or the entire source code of a medium‑size project. Many models exhibit "context decay" in the latter half of long inputs. V4 addresses this by redesigning the attention mechanism.

Mixed attention architecture

V4 Attention Mechanism:
├─ CSA (Compressed Sparse Attention) – compresses on token dimension
└─ HCA (Heavy Compression Attention) – drastically reduces KV‑cache consumption

Efficiency comparison

Single‑token inference FLOPs: baseline V3.2 100 %, V4‑Pro 27 % (73 % reduction).

KV‑cache consumption: baseline V3.2 100 %, V4‑Pro 10 % (90 % reduction).

Processing one million tokens therefore requires about one‑quarter the compute and one‑tenth the VRAM of V3.2.

Inference modes

Non‑think : fast response for everyday tasks; output is direct.

Think High : logical analysis for complex or planning tasks; returns thought process plus summary.

Think Max : extreme reasoning to explore model limits; requires special prompt and returns detailed reasoning.

API usage example

# Official API
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.deepseek.com"
)

# V4‑Pro
response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

# V4‑Flash
response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

Model download links

V4‑Pro – huggingface.co/deepseek-ai/DeepSeek-V4-Pro (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro)

V4‑Flash – huggingface.co/deepseek-ai/DeepSeek-V4-Flash (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash)

V4‑Pro‑Base – huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Pro-Base)

V4‑Flash‑Base – huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base (ModelScope: modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash-Base)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

APIDeepSeekbenchmarkV4open-source LLMmixed attentionmillion-token context
ZhiKe AI
Written by

ZhiKe AI

We dissect AI-era technologies, tools, and trends with a hardcore perspective. Focused on large models, agents, MCP, function calling, and hands‑on AI development. No fluff, no hype—only actionable insights, source code, and practical ideas. Get a daily dose of intelligence to simplify tech and make efficiency tangible.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.