Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Analyzing the credibility of Yifan Zhang’s brief “V4, next week” tweet, the article examines five supporting signals, details three newly revealed architecture components—Sparse MQA, Fused MoE Mega Kernel, and Manifold‑Constrained Hyper‑Connections—and summarizes V4’s rumored specifications, pricing, and strategic implications.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Architecture components

Sparse Multi‑Query Attention (Sparse MQA) extends standard Multi‑Query Attention by making the attention pattern sparse: each token attends only to the most relevant context instead of the full sequence. This reduces the quadratic cost of conventional attention to near‑linear (O(N) → O(N log N)), enabling practical million‑token windows. The rumored V4 also includes a Lightning Indexer that can retrieve information from documents spanning hundreds of pages within 20 ms while preserving coherence.

Fused MoE Mega Kernel merges the routing step and expert matrix multiplication of Mixture‑of‑Experts (MoE) into a single GPU kernel, eliminating intermediate kernel launches and memory moves. This directly lowers inference latency by removing unnecessary data movement.

Manifold‑Constrained Hyper‑Connections (mHC) generalizes the residual addition in Transformers to multiple learnable weighted pathways. Early versions suffered uncontrolled signal amplification (over 3000×), causing training crashes. The mHC solution projects the connection matrix onto a specific mathematical manifold using the Sinkhorn‑Knopp algorithm, limiting signal growth to within 2×. According to the paper

arXiv:2512.24880

, this adds only ~6.7 % extra compute while enabling stable training of trillion‑parameter models.

Known specifications

Parameter scale : approximately 1 trillion parameters (MoE), with each token activating roughly 32‑37 billion parameters.

Context window : about 1 million tokens.

Training chips : first deep adaptation to Huawei Ascend 950PR and other domestic processors, achieving a full‑stack domestic compute stack.

License : Apache 2.0, commercial use permitted.

Core architecture : Sparse MQA + Fused MoE Mega Kernel + mHC.

Key technical observations

Full‑stack domestic compute : Running V4 entirely on Huawei Ascend chips implies migration from CUDA to the CANN framework, demonstrating a complete domestic AI stack.

Compute infrastructure upgrade : After a 12‑hour outage in late March, DeepSeek announced a new large data center in Ulanqab and secured external financing valued at roughly $10 billion, highlighting the need for owned compute resources.

DeepSeekLarge Language ModelAI ArchitectureFused MoEManifold-Constrained Hyper-ConnectionsSparse MQA
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.