Machine Learning Algorithms & Natural Language Processing
Jun 2, 2026 · Artificial Intelligence
OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference
OSCAR presents an attention‑aware rotation scheme that compresses KV caches to true 2‑bit, cutting memory usage by up to 8× and boosting decode throughput by up to 7×, while preserving inference quality within a few points of BF16 across multiple models and long‑context benchmarks, outperforming TurboQuant.
2-bit quantizationKV cacheOSCAR
0 likes · 13 min read
