Artificial Intelligence 7 min read

DeepSeek V4 Launches with 1M‑Token Context, Dual Versions and Native Chinese Chip Support

On April 24, 2026 DeepSeek released the V4 preview featuring two models—V4‑Pro with a 1.6 T‑parameter MoE architecture and V4‑Flash with 284 B parameters—both offering 1 million token context, up to 384 K output tokens, new step‑wise reasoning modes, and full native compatibility with Huawei Ascend and Cambricon chips, while delivering major efficiency gains and benchmark‑leading performance.

Architects' Tech Alliance

Apr 24, 2026

DeepSeek V4 Launches with 1M‑Token Context, Dual Versions and Native Chinese Chip Support

After months of anticipation, DeepSeek unveiled the V4 preview on April 24, 2026, releasing two variants that target different usage scenarios.

V4‑Pro : built on a 1.6 T‑parameter Mixture‑of‑Experts (MoE) architecture, activating roughly 49 B parameters during inference and optimized for top‑tier performance.

V4‑Flash : a lightweight model with 284 B total parameters and 13 B active parameters, balancing speed and cost.

Both versions share a full‑stack 1 million token context window and a maximum output length of 384 K tokens, enabling the processing of entire books, codebases, or large contracts without fragmentation.

The models introduce a new "thinking mode" that supports step‑wise reasoning, deep contemplation, and adjustable high/max intensity levels, which markedly improve logical rigor and result quality in complex reasoning, code generation, and agent execution tasks. They also natively support JSON output, tool invocation, dialogue prefix continuation, and FIM completion, ensuring seamless integration with mainstream development frameworks.

Core technical breakthroughs include an innovative hybrid attention architecture that combines compressed sparse attention with highly compressed attention, coupled with a manifold‑constrained hyper‑connection structure that multiplies long‑context processing efficiency. The new Muon optimizer accelerates convergence, making training more stable and efficient. Compared with the previous generation, V4‑Pro reduces inference FLOPs by 73 % and shrinks KV‑cache size by 90 %, achieving stronger performance at lower compute cost.

Benchmark results show V4‑Pro surpassing open‑source model records in inference speed, world knowledge, and agent programming, rivaling top proprietary models, while V4‑Flash delivers comparable inference performance in a smaller footprint, making it a cost‑effective choice for lightweight scenarios. An aggressive pricing strategy further lowers the barrier to high‑performance AI.

Significantly, DeepSeek V4 is designed for native compatibility with Chinese hardware: the Huawei Ascend 950PR chip is the first to be supported, and Cambricon has completed Day 0 adaptation with open‑source code, marking a shift from mere compatibility to deep co‑evolution between domestic large models and domestic compute platforms.

Overall, DeepSeek V4 represents a technical leap in large‑model capabilities, offering unprecedented context length, advanced reasoning modes, and strong alignment with China’s AI hardware ecosystem, poised to drive efficiency gains across research, legal, medical, and enterprise domains.

DeepSeek Large Language Model MoE Huawei Ascend Cambricon mixed attention 1M token context

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.