DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the V4 series—V4‑Pro (1.6 T total, 49 B active) and V4‑Flash (284 B total, 13 B active)—featuring three architectural upgrades, three inference modes, mixed‑precision FP4/FP8 weights, and benchmark results that place its programming ability at the top of open‑source models while supporting a million‑token context window.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the DeepSeek‑V4 series on Hugging Face, comprising two models: V4‑Pro with 1.6 T total parameters (49 B active) and V4‑Flash with 284 B total parameters (13 B active). Both support an ultra‑long context of 1 million tokens.

Three key architectural upgrades

Hybrid attention mechanism : combines Compressed Sparse Attention (CSA) with Heavy Compression Attention (HCA) to optimise ultra‑long contexts. In a million‑token inference scenario V4‑Pro uses 27 % less per‑token compute and only 10 % of the KV‑cache compared with DeepSeek‑V3.2.

Manifold‑Constrained Hyper‑Connection (mHC) : strengthens residual connections to improve signal‑propagation stability in deep networks while preserving expressivity.

Muon optimiser : reported to accelerate convergence and improve training stability.

Pre‑training consumed over 32 T tokens. The training pipeline follows a two‑stage process: first, independent domain experts are cultivated via Supervised Fine‑Tuning (SFT) and GRPO‑based reinforcement learning; second, an on‑policy distillation step merges the specialised abilities into a single model.

Three inference modes

Non‑think : fast, intuitive responses for everyday tasks.

Think High : enables logical analysis for more complex problems.

Think Max : full‑strength reasoning that pushes the model to its limits, requiring at least a 384 K token context window.

Benchmark performance: programming leads the pack

In head‑to‑head comparisons DeepSeek‑V4‑Pro‑Max shows balanced overall performance, with programming scores standing out. On LiveCodeBench it achieves 93.5 % and on Codeforces it scores 3206, ranking first among open‑source models and surpassing Gemini‑3.1‑Pro (91.7 % / 3052) and GPT‑5.4 (Codeforces 3168).

For knowledge tasks Chinese‑SimpleQA records 84.4 %, second only to Gemini‑3.1‑Pro (85.9 %) and ahead of Claude Opus‑4.6 and GPT‑5.4. In long‑context evaluations MRCR‑1M reaches 83.5 % and CorpusQA‑1M reaches 62.0 %, making V4 one of the few open models capable of handling million‑token workloads.

The model lags in Apex comprehensive reasoning (38.3 % versus Gemini’s 60.9 %) and some agent tasks, indicating room for improvement in complex multi‑step reasoning compared with closed‑source leaders.

Precision format: FP4 + FP8 mixed

Weights use a mixed‑precision strategy: MoE expert parameters are stored in FP4, while the majority of other parameters use FP8. This trade‑off balances performance and memory consumption, enabling deployment of a 1.6 T‑parameter model on consumer‑grade hardware.

Model download: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

DeepSeek V4 illustration
DeepSeek V4 illustration
Mixture of ExpertsDeepSeeklarge language modelbenchmarklong contextAI Architecture
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.