Artificial Intelligence 5 min read

DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the V4 series—V4‑Pro (1.6 T total, 49 B active) and V4‑Flash (284 B total, 13 B active)—featuring three architectural upgrades, three inference modes, mixed‑precision FP4/FP8 weights, and benchmark results that place its programming ability at the top of open‑source models while supporting a million‑token context window.

Architect's Tech Stack

Apr 25, 2026

DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the DeepSeek‑V4 series on Hugging Face, comprising two models: V4‑Pro with 1.6 T total parameters (49 B active) and V4‑Flash with 284 B total parameters (13 B active). Both support an ultra‑long context of 1 million tokens.

Three key architectural upgrades

Hybrid attention mechanism : combines Compressed Sparse Attention (CSA) with Heavy Compression Attention (HCA) to optimise ultra‑long contexts. In a million‑token inference scenario V4‑Pro uses 27 % less per‑token compute and only 10 % of the KV‑cache compared with DeepSeek‑V3.2.

Manifold‑Constrained Hyper‑Connection (mHC) : strengthens residual connections to improve signal‑propagation stability in deep networks while preserving expressivity.

Muon optimiser : reported to accelerate convergence and improve training stability.

Pre‑training consumed over 32 T tokens. The training pipeline follows a two‑stage process: first, independent domain experts are cultivated via Supervised Fine‑Tuning (SFT) and GRPO‑based reinforcement learning; second, an on‑policy distillation step merges the specialised abilities into a single model.

Three inference modes

Non‑think : fast, intuitive responses for everyday tasks.

Think High : enables logical analysis for more complex problems.

Think Max : full‑strength reasoning that pushes the model to its limits, requiring at least a 384 K token context window.

Benchmark performance: programming leads the pack

In head‑to‑head comparisons DeepSeek‑V4‑Pro‑Max shows balanced overall performance, with programming scores standing out. On LiveCodeBench it achieves 93.5 % and on Codeforces it scores 3206, ranking first among open‑source models and surpassing Gemini‑3.1‑Pro (91.7 % / 3052) and GPT‑5.4 (Codeforces 3168).

For knowledge tasks Chinese‑SimpleQA records 84.4 %, second only to Gemini‑3.1‑Pro (85.9 %) and ahead of Claude Opus‑4.6 and GPT‑5.4. In long‑context evaluations MRCR‑1M reaches 83.5 % and CorpusQA‑1M reaches 62.0 %, making V4 one of the few open models capable of handling million‑token workloads.

The model lags in Apex comprehensive reasoning (38.3 % versus Gemini’s 60.9 %) and some agent tasks, indicating room for improvement in complex multi‑step reasoning compared with closed‑source leaders.

Precision format: FP4 + FP8 mixed

Weights use a mixed‑precision strategy: MoE expert parameters are stored in FP4, while the majority of other parameters use FP8. This trade‑off balances performance and memory consumption, enabling deployment of a 1.6 T‑parameter model on consumer‑grade hardware.

Model download: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

Mixture of Experts DeepSeek large language model benchmark long context AI Architecture

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Three key architectural upgrades

Three inference modes

Benchmark performance: programming leads the pack

Precision format: FP4 + FP8 mixed

Architect's Tech Stack

How this landed with the community

Was this worth your time?

0 Comments

Precision format: FP4 + FP8 mixed