How DeepSeek‑V4 Achieves Million‑Token Context via Aggressive KV‑Cache Compression
DeepSeek‑V4 reaches a million‑token context window by aggressively compressing its KV‑cache and employing a hybrid attention scheme that combines Compressed Sparse Attention (CSA) for selective top‑k retrieval with Heavily Compressed Attention (HCA) for full‑attention over heavily merged entries, alongside mixed‑precision storage and other engineering optimizations.
