DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings
DeepSeek released the V4 series—V4‑Pro (1.6 T total, 49 B active) and V4‑Flash (284 B total, 13 B active)—featuring three architectural upgrades, three inference modes, mixed‑precision FP4/FP8 weights, and benchmark results that place its programming ability at the top of open‑source models while supporting a million‑token context window.
DeepSeek released the DeepSeek‑V4 series on Hugging Face, comprising two models: V4‑Pro with 1.6 T total parameters (49 B active) and V4‑Flash with 284 B total parameters (13 B active). Both support an ultra‑long context of 1 million tokens.
Three key architectural upgrades
Hybrid attention mechanism : combines Compressed Sparse Attention (CSA) with Heavy Compression Attention (HCA) to optimise ultra‑long contexts. In a million‑token inference scenario V4‑Pro uses 27 % less per‑token compute and only 10 % of the KV‑cache compared with DeepSeek‑V3.2.
Manifold‑Constrained Hyper‑Connection (mHC) : strengthens residual connections to improve signal‑propagation stability in deep networks while preserving expressivity.
Muon optimiser : reported to accelerate convergence and improve training stability.
Pre‑training consumed over 32 T tokens. The training pipeline follows a two‑stage process: first, independent domain experts are cultivated via Supervised Fine‑Tuning (SFT) and GRPO‑based reinforcement learning; second, an on‑policy distillation step merges the specialised abilities into a single model.
Three inference modes
Non‑think : fast, intuitive responses for everyday tasks.
Think High : enables logical analysis for more complex problems.
Think Max : full‑strength reasoning that pushes the model to its limits, requiring at least a 384 K token context window.
Benchmark performance: programming leads the pack
In head‑to‑head comparisons DeepSeek‑V4‑Pro‑Max shows balanced overall performance, with programming scores standing out. On LiveCodeBench it achieves 93.5 % and on Codeforces it scores 3206, ranking first among open‑source models and surpassing Gemini‑3.1‑Pro (91.7 % / 3052) and GPT‑5.4 (Codeforces 3168).
For knowledge tasks Chinese‑SimpleQA records 84.4 %, second only to Gemini‑3.1‑Pro (85.9 %) and ahead of Claude Opus‑4.6 and GPT‑5.4. In long‑context evaluations MRCR‑1M reaches 83.5 % and CorpusQA‑1M reaches 62.0 %, making V4 one of the few open models capable of handling million‑token workloads.
The model lags in Apex comprehensive reasoning (38.3 % versus Gemini’s 60.9 %) and some agent tasks, indicating room for improvement in complex multi‑step reasoning compared with closed‑source leaders.
Precision format: FP4 + FP8 mixed
Weights use a mixed‑precision strategy: MoE expert parameters are stored in FP4, while the majority of other parameters use FP8. This trade‑off balances performance and memory consumption, enabling deployment of a 1.6 T‑parameter model on consumer‑grade hardware.
Model download: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
