DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants

DeepSeek-V4 introduces two open‑source LLMs—V4‑Pro with 1.6 trillion total parameters and V4‑Flash with 284 billion—offering a 1 million‑token context window, hybrid attention, multi‑head compression, and a new Muon optimizer, all under an MIT license that rivals top closed‑source models.

AI Explorer
AI Explorer
AI Explorer
DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants

DeepSeek has released the V4 series preview, comprising two models— DeepSeek-V4-Pro and DeepSeek-V4-Flash —both open‑source under the MIT license and free for commercial use.

Core Parameters

DeepSeek-V4-Pro : 1.6 T total parameters, 49 B active parameters, 1 million‑token context window.

DeepSeek-V4-Flash : 284 B total parameters, 13 B active parameters, 1 million‑token context window.

Architectural Innovations

Hybrid Attention : combines full attention and sliding‑window mechanisms to optimize short‑ and long‑range tasks without sacrificing efficiency.

Multi‑head Compression (mHC) : compresses multiple attention heads into fewer representations, reducing KV cache size while preserving key information, enabling practical use of the 1 million‑token context.

Muon Optimizer : a second‑order optimizer that converges faster and more stably than mainstream AdamW variants in large‑scale MoE training.

Training Pipeline

DeepSeek employs a two‑stage post‑training process:

Stage 1: General capability alignment (SFT).

Stage 2: Inference capability reinforcement (RLHF + inference mode), offering two modes— Think Max for deep reasoning and Think Fast for rapid response.

Performance

Benchmark results released by DeepSeek show V4‑Pro matching or approaching the performance of leading closed‑source models across multiple tasks. The Flash variant delivers comparable inference speed to the Pro model on simple agent tasks while being faster and cheaper to run.

The community response highlights that V4 provides a locally deployable, high‑performance alternative to models like GPT‑5.

Implications

DeepSeek’s strategy aims to deliver world‑class models at roughly one‑tenth the cost; reported training expenses are under $6 million, far lower than the tens‑of‑millions required for comparable closed‑source efforts.

All model weights are available on Hugging Face and ModelScope (FP8 / FP4+FP8 mixed precision). Users can run the models locally via Ollama on macOS or PC, and the codebase is open on GitHub (github.com/deepseek-ai).

Deployment & Pricing

Online demo: chat.deepseek.com (iOS/Android apps also released).

Local deployment: weights on Hugging Face/ModelScope, Ollama support, GitHub source.

API pricing: Flash version is extremely low‑cost for large‑scale integration; Pro version pricing is pending.

DeepSeek-V4 illustration
DeepSeek-V4 illustration

In summary, DeepSeek‑V4 pushes open‑source AI to a 1 million‑token, 1.6 T‑parameter frontier under a permissive MIT license, redefining what constitutes a top‑tier model.

Large Language Modelopen-source AIHybrid attentionMuon optimizerDeepSeek V4Multi-head Compression
AI Explorer
Written by

AI Explorer

Stay on track with the blogger and advance together in the AI era.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.