Lao Guo's Learning Space
Apr 30, 2026 · Artificial Intelligence
How DeepSeek V4’s CSA + HCA Break the Million‑Token Barrier
Traditional full‑attention cannot handle million‑token contexts due to exponential compute and memory growth, but DeepSeek V4’s Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) compress, sparsely index, and precisely compute tokens, cutting KV cache to 10% and FLOPs to 27% while enabling a 1‑M token window on a single GPU.
Attention MechanismCSAHCA
0 likes · 12 min read
