DeepSeek V4 Launches with 1.6 T Parameters and 1 Million‑Token Context
DeepSeek V4, released on April 24 2026, offers two SKUs—Pro with 1.6 T total parameters and Flash with 284 B—both supporting a 1‑million‑token context window, ultra‑low inference cost, pricing as low as ¥0.2 per million tokens, Huawei Ascend deployment, and seamless OpenAI/Anthropic API compatibility.
Release Overview
On April 24, 2026, DeepSeek announced the official launch of its next‑generation large language model, DeepSeek V4, made available to developers worldwide in preview mode. The release follows the impact of DeepSeek V3 at the end of 2024.
Model Variants and Core Specifications
DeepSeek V4 follows a dual‑SKU strategy:
DeepSeek V4‑Pro
Total parameters: 1.6 T
Active (trainable) parameters: 490 B
Pre‑training data: 33 T tokens
Context window: 1 million tokens
Maximum output length: 384 K tokens
DeepSeek V4‑Flash
Total parameters: 284 B
Active parameters: 130 B
Pre‑training data: 32 T tokens
Context window: 1 million tokens
Maximum output length: 384 K tokens
The Flash variant achieves strong performance with only 130 B active parameters, resulting in extremely low inference cost and suitability for large‑scale, high‑frequency deployments.
Pricing Strategy
DeepSeek V4 adopts a highly competitive pricing model:
V4‑Flash : ¥0.2 per million input tokens (cache hit), ¥1 per million (cache miss), ¥2 per million output tokens.
V4‑Pro : ¥1 per million input tokens (cache hit), ¥12 per million (cache miss), ¥24 per million output tokens.
At ¥0.2 per million tokens for cached inputs, V4‑Flash is positioned as one of the cheapest options in the current LLM market, especially advantageous for use cases requiring ultra‑long context such as whole‑code‑base analysis or long‑document summarization.
Technical Innovations Enabling Million‑Token Context
Three key engineering advances make the 1‑million‑token window feasible:
Engram‑style memory mechanism : Inspired by human memory, it stores and retrieves long‑range information efficiently, mitigating the degradation of traditional attention over very long sequences.
Manifold‑Constrained Attention : Retains expressive power while substantially lowering computational complexity, allowing hardware‑friendly processing of massive contexts.
DeepSeek Sparse Attention (DSA) : Applies intelligent sparsity to compute full attention only for critical token pairs, dramatically reducing GPU load.
Domestic Chip Deployment
Unlike many models that rely on NVIDIA CUDA, DeepSeek V4 is largely trained and deployed on Huawei Ascend chips. This choice addresses three strategic concerns:
Supply‑chain security : Reduces dependence on overseas GPUs and lowers geopolitical risk.
Data compliance : Meets domestic data‑sovereignty and privacy regulations.
Ecosystem demonstration : Provides a large‑scale training reference for Chinese AI silicon.
Retirement of Older Model Identifiers
DeepSeek plans to migrate legacy model IDs to the new V4 identifiers: deepseek-chat → deepseek-v4-flash (non‑reasoning mode) deepseek-reasoner → deepseek-v4-flash (reasoning mode)
Developers should update their configurations promptly to avoid production disruptions.
Developer Onboarding
The V4 API is fully compatible with OpenAI and Anthropic request formats, minimizing migration effort. Example endpoint configuration:
# OpenAI‑compatible endpoint
base_url = "https://api.deepseek.com"
model = "deepseek-v4-flash" # or "deepseek-v4-pro"
# Anthropic‑compatible endpoint
base_url = "https://api.deepseek.com/anthropic"Users can also interact with the model via the web UI at chat.deepseek.com or the official mobile app.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Agent Super App
AI agent applications, installation, large-model testing, computer fundamentals, IT operations and maintenance exchange, network technology exchange, Linux learning
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
