Artificial Intelligence 3 min read

Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek’s recent FlashMLA update introduces the new MODEL1, featuring a tighter KV-Cache layout, an extra two-stage cache, and a fixed 512×512 head dimension, with four code changes detailed in a public GitHub commit and illustrated by comparative diagrams.

PaperAgent

Jan 21, 2026

Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek recently updated the flash_mla_interface.py file in the FlashMLA repository, introducing a new model named MODEL1 with four code modifications.

Analyzing the diff (https://github.com/deepseek-ai/FlashMLA/commit/48c6dc426f045cb7743b18f5c7329f35f1b7ed79) reveals three fundamental differences between MODEL1’s Multi-head Latent Attention (MLA) KV-Cache implementation and the previous V3/V3.2 series:

More compact physical layout: V3’s FP8 block is composed of 128 B data + 16 B scale + 128 B RoPE (three segments). MODEL1 interleaves NoPE with RoPE and places the scale immediately after, reducing the total block size to 576 B (V3.2 uses 656 B).

Added “extra” two‑stage cache: The function signature no longer hard‑codes “MODEL1‑specific”, but retains the extra_k_cache and extra_indices_in_kvcache parameters.

Fixed head dimension: The head dimension is locked at 512 × 512, eliminating support for variable dimensions such as 128 or 192. Corresponding comments about flexible head dimensions were removed because the code now enforces the 512 size.

The commit link above provides the full code changes.