Tagged articles
1 articles
Page 1 of 1
Machine Heart
Machine Heart
Jun 2, 2026 · Artificial Intelligence

Training Transformers to Be Compression‑Friendly: A New Memory‑Discard Paradigm

The article analyzes the KV‑Cache memory bottleneck of long‑context Transformers, introduces the KV‑CAT (KV‑Compression Aware Training) approach that simulates cache compression during pre‑training, and presents experiments showing unchanged base abilities while dramatically improving post‑training compression, retrieval and long‑text QA performance.

KV cacheKV-CATTransformer
0 likes · 10 min read
Training Transformers to Be Compression‑Friendly: A New Memory‑Discard Paradigm