DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All

DeepSeek-V4 introduces a hybrid attention architecture, manifold‑constrained hyper‑connections, and the Muon optimizer to cut inference FLOPs and KV cache dramatically, enabling open‑source models to handle million‑token contexts at a fraction of the cost of leading closed‑source services while matching their performance.

SuanNi
SuanNi
SuanNi
DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All

DeepSeek-V4 has been released, marking a watershed moment for the open‑source AI community by delivering efficient million‑token context capability that was previously limited to a few tech giants.

Key Architectural and Optimization Breakthroughs

The model series incorporates three major upgrades:

Hybrid attention architecture that combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), reducing single‑token inference FLOPs to 27% of DeepSeek‑V3.2 and KV‑cache usage to 10% in million‑token scenarios.

Manifold‑Constrained Hyper‑Connections (mHC) that enhance traditional residual links, preserving model expressiveness while stabilising cross‑layer signal propagation.

Muon optimizer, which accelerates convergence and improves training stability.

These innovations allow the model to differentiate information importance and perform selective computation, dramatically lowering both compute and KV‑cache demands for processing ultra‑long texts.

Cost and Accessibility Impact

By slashing resource requirements, developers can now feed an entire code repository or massive document collections to the model at low cost. DeepSeek‑V4‑Flash API is priced at 2 CNY (≈ 0.3 USD) per million tokens, compared with OpenAI’s GPT‑5.5 price of $30 per million tokens, effectively breaking the cost barrier for individuals and small businesses.

Model Variants and Parameter Configurations

Two main variants are offered:

DeepSeek‑V4‑Pro with 1.6 T total parameters and 49 B activation parameters.

DeepSeek‑V4‑Flash with 284 B total parameters and 13 B activation parameters.

Both support a context length of one million tokens. The base version runs at FP8 precision, while the instruction‑tuned version uses a FP4 + FP8 mixture (FP4 for MoE expert parameters, FP8 for the rest).

Performance and Inference Modes

Both variants support three inference‑intensity modes. Under the same mode, DeepSeek‑V4‑Flash delivers performance comparable to V4‑Pro while its output cost is more than ten times lower, making it a “treasure model” for everyday tasks.

Benchmark Results: Matching Top Closed‑Source Models

DeepSeek‑V4‑Pro‑Max’s performance gap to the world’s leading closed‑source models is now minimal. In Agentic Coding evaluations, V4‑Pro achieves the best open‑source score and performs strongly on other agent benchmarks, surpassing Sonnet 4.5 and approaching Opus 4.6 in non‑thinking mode.

In world‑knowledge tests, V4‑Pro leads all other open models and trails only the top closed model Gemini‑Pro‑3.1. It also outperforms all publicly evaluated open models on mathematics, STEM, and competitive coding tasks, delivering results comparable to the best closed‑source systems.

Compatibility and Ecosystem Integration

DeepSeek‑V4 has been adapted and optimised for major agent products such as Claude Code, OpenClaw, OpenCode, and CodeBuddy. It also fully supports domestic chips, reducing reliance on Nvidia CUDA and opening a path toward diversified, self‑controlled compute resources.

Conclusion

With its architectural innovations, dramatically reduced resource requirements, and competitive benchmark performance, DeepSeek‑V4 positions the open‑source community as a strong driver of the upcoming trends of million‑token context and high‑performance agents, lowering the entry barrier without sacrificing capability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelbenchmarkopen-source AIHybrid attentionDeepSeek V4million-token context
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.