Machine Heart
Apr 29, 2026 · Artificial Intelligence
LCA Boosts Long-Context Inference: 2.5× Speedup and 90% KV Cache Reduction
The Latent‑Condensed Attention (LCA) method dramatically cuts KV‑cache memory by 90%, speeds up pre‑fill by 2.5× and reduces decode latency by 1.8× for 128K‑token contexts, while requiring no extra parameters and preserving model performance across diverse LLMs.
Efficient AttentionInference accelerationKV cache reduction
0 likes · 10 min read
