Do Transformers Need Three Projections? Sharing K‑V Cuts KV Cache by 50%
A systematic ICML 2026 study shows that sharing the K and V projection matrices in Transformers reduces KV cache size by half while incurring less than 5% perplexity degradation, offering a simple, retrain‑once solution for long‑context and edge inference.
