How DeepSeek‑V4 Achieves Million‑Token Context via Aggressive KV‑Cache Compression

DeepSeek‑V4 reaches a million‑token context window by aggressively compressing its KV‑cache and employing a hybrid attention scheme that combines Compressed Sparse Attention (CSA) for selective top‑k retrieval with Heavily Compressed Attention (HCA) for full‑attention over heavily merged entries, alongside mixed‑precision storage and other engineering optimizations.

Compressed Sparse AttentionDeepSeek V4Heavily Compressed Attention

0 likes · 7 min read

How DeepSeek‑V4 Achieves Million‑Token Context via Aggressive KV‑Cache Compression

Machine Heart

Apr 30, 2026 · Artificial Intelligence

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

DeepSeek has released a multimodal model built on a visual‑primitive reasoning paradigm that treats coordinates and bounding boxes as reasoning units, dramatically compresses visual tokens, and achieves state‑of‑the‑art performance on counting, spatial, and topological tasks, while exposing current limits of multimodal inference.

AI reasoningCompressed Sparse AttentionDeepSeek

0 likes · 12 min read

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

AI2ML AI to Machine Learning

Apr 25, 2026 · Artificial Intelligence

How DeepSeek V4 Advances Structured Optimization in the Large‑Model Era

The article analyses DeepSeek V4’s architectural innovations—including Compressed Sparse Attention, Heavily Compressed Attention, a cross‑layer MoE design, and an Agent‑RL framework with Generative Reward Models and multi‑teacher distillation—while comparing its long‑context capabilities and efficiency to rival LLMs such as GLM, Kimi, Claude, GPT and Gemini.

Agent Reinforcement LearningCompressed Sparse AttentionDeepSeek V4

0 likes · 7 min read

How DeepSeek V4 Advances Structured Optimization in the Large‑Model Era