Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture
Analyzing the credibility of Yifan Zhang’s brief “V4, next week” tweet, the article examines five supporting signals, details three newly revealed architecture components—Sparse MQA, Fused MoE Mega Kernel, and Manifold‑Constrained Hyper‑Connections—and summarizes V4’s rumored specifications, pricing, and strategic implications.
Architecture components
Sparse Multi‑Query Attention (Sparse MQA) extends standard Multi‑Query Attention by making the attention pattern sparse: each token attends only to the most relevant context instead of the full sequence. This reduces the quadratic cost of conventional attention to near‑linear (O(N) → O(N log N)), enabling practical million‑token windows. The rumored V4 also includes a Lightning Indexer that can retrieve information from documents spanning hundreds of pages within 20 ms while preserving coherence.
Fused MoE Mega Kernel merges the routing step and expert matrix multiplication of Mixture‑of‑Experts (MoE) into a single GPU kernel, eliminating intermediate kernel launches and memory moves. This directly lowers inference latency by removing unnecessary data movement.
Manifold‑Constrained Hyper‑Connections (mHC) generalizes the residual addition in Transformers to multiple learnable weighted pathways. Early versions suffered uncontrolled signal amplification (over 3000×), causing training crashes. The mHC solution projects the connection matrix onto a specific mathematical manifold using the Sinkhorn‑Knopp algorithm, limiting signal growth to within 2×. According to the paper
arXiv:2512.24880
, this adds only ~6.7 % extra compute while enabling stable training of trillion‑parameter models.
Known specifications
Parameter scale : approximately 1 trillion parameters (MoE), with each token activating roughly 32‑37 billion parameters.
Context window : about 1 million tokens.
Training chips : first deep adaptation to Huawei Ascend 950PR and other domestic processors, achieving a full‑stack domestic compute stack.
License : Apache 2.0, commercial use permitted.
Core architecture : Sparse MQA + Fused MoE Mega Kernel + mHC.
Key technical observations
Full‑stack domestic compute : Running V4 entirely on Huawei Ascend chips implies migration from CUDA to the CANN framework, demonstrating a complete domestic AI stack.
Compute infrastructure upgrade : After a 12‑hour outage in late March, DeepSeek announced a new large data center in Ulanqab and secured external financing valued at roughly $10 billion, highlighting the need for owned compute resources.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
