DeepSeek V4 Launches with 1.6 T Parameters and 1 Million‑Token Context

DeepSeek V4, released on April 24 2026, offers two SKUs—Pro with 1.6 T total parameters and Flash with 284 B—both supporting a 1‑million‑token context window, ultra‑low inference cost, pricing as low as ¥0.2 per million tokens, Huawei Ascend deployment, and seamless OpenAI/Anthropic API compatibility.

AI Agent Super App
AI Agent Super App
AI Agent Super App
DeepSeek V4 Launches with 1.6 T Parameters and 1 Million‑Token Context

Release Overview

On April 24, 2026, DeepSeek announced the official launch of its next‑generation large language model, DeepSeek V4, made available to developers worldwide in preview mode. The release follows the impact of DeepSeek V3 at the end of 2024.

Model Variants and Core Specifications

DeepSeek V4 follows a dual‑SKU strategy:

DeepSeek V4‑Pro

Total parameters: 1.6 T

Active (trainable) parameters: 490 B

Pre‑training data: 33 T tokens

Context window: 1 million tokens

Maximum output length: 384 K tokens

DeepSeek V4‑Flash

Total parameters: 284 B

Active parameters: 130 B

Pre‑training data: 32 T tokens

Context window: 1 million tokens

Maximum output length: 384 K tokens

The Flash variant achieves strong performance with only 130 B active parameters, resulting in extremely low inference cost and suitability for large‑scale, high‑frequency deployments.

Pricing Strategy

DeepSeek V4 adopts a highly competitive pricing model:

V4‑Flash : ¥0.2 per million input tokens (cache hit), ¥1 per million (cache miss), ¥2 per million output tokens.

V4‑Pro : ¥1 per million input tokens (cache hit), ¥12 per million (cache miss), ¥24 per million output tokens.

At ¥0.2 per million tokens for cached inputs, V4‑Flash is positioned as one of the cheapest options in the current LLM market, especially advantageous for use cases requiring ultra‑long context such as whole‑code‑base analysis or long‑document summarization.

Technical Innovations Enabling Million‑Token Context

Three key engineering advances make the 1‑million‑token window feasible:

Engram‑style memory mechanism : Inspired by human memory, it stores and retrieves long‑range information efficiently, mitigating the degradation of traditional attention over very long sequences.

Manifold‑Constrained Attention : Retains expressive power while substantially lowering computational complexity, allowing hardware‑friendly processing of massive contexts.

DeepSeek Sparse Attention (DSA) : Applies intelligent sparsity to compute full attention only for critical token pairs, dramatically reducing GPU load.

Domestic Chip Deployment

Unlike many models that rely on NVIDIA CUDA, DeepSeek V4 is largely trained and deployed on Huawei Ascend chips. This choice addresses three strategic concerns:

Supply‑chain security : Reduces dependence on overseas GPUs and lowers geopolitical risk.

Data compliance : Meets domestic data‑sovereignty and privacy regulations.

Ecosystem demonstration : Provides a large‑scale training reference for Chinese AI silicon.

Retirement of Older Model Identifiers

DeepSeek plans to migrate legacy model IDs to the new V4 identifiers: deepseek-chatdeepseek-v4-flash (non‑reasoning mode) deepseek-reasonerdeepseek-v4-flash (reasoning mode)

Developers should update their configurations promptly to avoid production disruptions.

Developer Onboarding

The V4 API is fully compatible with OpenAI and Anthropic request formats, minimizing migration effort. Example endpoint configuration:

# OpenAI‑compatible endpoint
base_url = "https://api.deepseek.com"
model = "deepseek-v4-flash"  # or "deepseek-v4-pro"

# Anthropic‑compatible endpoint
base_url = "https://api.deepseek.com/anthropic"

Users can also interact with the model via the web UI at chat.deepseek.com or the official mobile app.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Mixture of ExpertsDeepSeeklarge language modelAI pricingHuawei AscendAPI compatibilitymillion-token context
AI Agent Super App
Written by

AI Agent Super App

AI agent applications, installation, large-model testing, computer fundamentals, IT operations and maintenance exchange, network technology exchange, Linux learning

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.