Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek’s recent FlashMLA update introduces the new MODEL1, featuring a tighter KV-Cache layout, an extra two-stage cache, and a fixed 512×512 head dimension, with four code changes detailed in a public GitHub commit and illustrated by comparative diagrams.

PaperAgent
PaperAgent
PaperAgent
Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek recently updated the flash_mla_interface.py file in the FlashMLA repository, introducing a new model named MODEL1 with four code modifications.

Analyzing the diff (https://github.com/deepseek-ai/FlashMLA/commit/48c6dc426f045cb7743b18f5c7329f35f1b7ed79) reveals three fundamental differences between MODEL1’s Multi-head Latent Attention (MLA) KV-Cache implementation and the previous V3/V3.2 series:

More compact physical layout: V3’s FP8 block is composed of 128 B data + 16 B scale + 128 B RoPE (three segments). MODEL1 interleaves NoPE with RoPE and places the scale immediately after, reducing the total block size to 576 B (V3.2 uses 656 B).

Added “extra” two‑stage cache: The function signature no longer hard‑codes “MODEL1‑specific”, but retains the extra_k_cache and extra_indices_in_kvcache parameters.

Fixed head dimension: The head dimension is locked at 512 × 512, eliminating support for variable dimensions such as 128 or 192. Corresponding comments about flexible head dimensions were removed because the code now enforces the 512 size.

The commit link above provides the full code changes.

Additional recommended readings (titles only) include:

2026, Doing Agentic AI – Two Must‑Read Year‑Opening Surveys

Twitter’s hidden algorithm open‑sourced by Elon Musk

Designing AI Agents: orchestration, memory, plugins, workflow, collaboration

Despite the hype, open‑source small models outperform OCR tasks

2026 New Trends: World Models × Embodied Intelligence – Latest Survey

DeepSeekAI ArchitectureFlashMLAKV cacheMulti-head Latent AttentionMODEL1
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.