Did Google’s TurboQuant Steal RaBitQ? Unpacking the AI Compression Controversy
The article examines Google’s TurboQuant compression breakthrough, its claimed 6‑fold KV cache reduction and 8× speedup, and the allegations that it mirrors the earlier RaBitQ method, detailing technical similarities, disputed experiments, market fallout, and the ongoing academic debate.
TurboQuant: Google’s Claimed Breakthrough
In March 2026 Google’s research team announced TurboQuant, a technique that purportedly reduces KV‑cache memory usage to one‑sixth of its original size while delivering an eight‑fold inference speedup with no loss in accuracy. The paper, titled TurboQuant: Online Vector Quantization with Near‑optimal Distortion Rate , was accepted to ICLR 2026 and quickly attracted massive media coverage, causing a sharp sell‑off in the memory‑chip market that erased over $900 billion in market value.
The method is described as a two‑step process: first, a random rotation of data vectors simplifies their geometric structure, followed by compression using the PolarQuant method; second, a 1‑bit correction stage applies the QJL algorithm to eliminate residual errors.
RaBitQ: The Earlier Work
RaBitQ, introduced by a post‑doctoral researcher at ETH Zurich in May 2024, is a high‑dimensional vector quantization technique that also employs a random rotation (Johnson‑Lindenstrauss transform) before quantization. Its theoretical guarantees were proven to achieve asymptotically optimal error bounds, a result recognized at FOCS 2017 and later published in SIGMOD 2024 and SIGMOD 2025. The code is fully open‑source on GitHub.
Allegations of Misconduct
A Chinese post‑doctoral researcher publicly accused TurboQuant of closely copying RaBitQ, citing three major issues: deliberate omission of related work, misleading experimental setups, and inaccurate theoretical claims.
TurboQuant allegedly downplays the similarity by moving RaBitQ discussion to an appendix and describing it merely as a grid‑based product quantization.
During peer review, a reviewer noted the overlap and requested a thorough comparison, which the authors did not provide.
The experimental comparison used a Python implementation of RaBitQ on a single‑core CPU, while TurboQuant’s results were obtained on an NVIDIA A100 GPU, leading to a misleading performance gap.
Timeline of the Dispute
May‑Sep 2024: RaBitQ papers and code released.
Jan 2025: TurboQuant’s second author contacts the RaBitQ team for assistance reproducing the method.
Apr 2025: TurboQuant paper posted on arXiv.
May 2025: Email exchanges clarify technical misunderstandings; TurboQuant authors acknowledge the single‑core limitation but do not disclose it.
Nov 2025: TurboQuant submitted to ICLR 2026 without correcting the identified issues.
Jan 2026: TurboQuant accepted; Google begins large‑scale promotion.
Community and Market Reaction
The controversy sparked extensive discussion on academic forums and social media. Reviewers praised TurboQuant’s results but also highlighted the methodological overlap. Critics argued that the paper’s narrative, amplified by Google’s promotion, could mislead both researchers and investors, potentially reshaping the AI‑hardware market.
Google’s limited response promised post‑conference corrections but refused to discuss the technical similarity, leaving the academic community without a definitive resolution.
Broader Implications
The case underscores the importance of proper citation, transparent experimental conditions, and timely rebuttal in high‑impact AI research. When a paper receives massive exposure from a top‑tier conference and a major tech company, its narrative can influence capital markets and industry directions, making rigorous scholarly standards even more critical.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
