What DeepSeek’s Secret “Model1” Reveals About the Upcoming V4 LLM
Analyzing recent DeepSeek flashmla repository commits, the article uncovers that the mysterious Model1 likely corresponds to DeepSeek‑V4, detailing architectural shifts to a 512‑dimensional head, full support for NVIDIA Blackwell GPUs, token‑level sparse MLA, and new mechanisms such as Value Vector Position Awareness and Engram.
On January 20, 2025 DeepSeek released the DeepSeek‑R1 model, sparking a wave of interest. A year later, the company’s flashmla codebase received several updates, and a new entry named Model1 attracted attention as a possible codename for the upcoming flagship model.
Using Gemini to analyze the recent commits in the flashmla repository, the author extracted technical details that suggest Model1 is the internal development name for DeepSeek‑V4.
1. Core Architecture: Return to 512‑Dimensional Heads
The macro DISPATCH_HEAD_DIM in csrc/api/common.h shows two branches:
V32 (DeepSeek‑V3.2) keeps d_qk = 576, reflecting the asymmetric MLA design (128‑dim RoPE + 448‑dim Latent). Model1 switches to a standard 512‑dimensional head, indicating a “normalization” step for DeepSeek‑V4, likely to align with the Blackwell (SM100) architecture or to improve latent compression.
2. Full Support for NVIDIA Blackwell (SM100) Architecture
The repository contains several Blackwell‑specific optimizations:
In api.cpp a new function FMHACutlassSM100FwdRun targets the SM100 instruction set.
The README states that running on B200 GPUs requires CUDA 12.9.
Performance figures: on B200 the not‑yet‑fully‑optimized Sparse MLA reaches ~350 TFlops, while Dense MLA on H800 (SM90a) achieves ~660 TFlops.
3. Introduction of Token‑Level Sparse MLA
Both test_flash_mla_sparse_decoding.py and test_flash_mla_dense_decoding.py appear, showing parallel sparse and dense decoding paths.
Sparse operators store KV cache in FP8 but perform matrix multiplication in bfloat16, preserving accuracy while reducing memory for very long contexts.
4. New Mechanisms: Value Vector Position Awareness (VVPA) and Engram
VVPA aims to mitigate positional information decay in long‑text MLA.
Engram, referenced in community discussions, appears to be a novel distributed storage or KV‑compression technique designed for the high‑throughput demands of Model1.
Because the code marks Model1 as a branch parallel to V32 rather than a patch, Gemini concludes that it represents a new architecture version, logically the next step after V3.2—DeepSeek‑V4.
Readers are invited to consider whether Model1 truly is the rumored DeepSeek V4.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
