OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference
OSCAR presents an attention‑aware rotation scheme that compresses KV caches to true 2‑bit, cutting memory usage by up to 8× and boosting decode throughput by up to 7×, while preserving inference quality within a few points of BF16 across multiple models and long‑context benchmarks, outperforming TurboQuant.
