How RaBitQ Achieves 32× Vector Compression Without Sacrificing Accuracy
This article explains the challenges of high‑dimensional vector retrieval, introduces quantization techniques—especially the binary RaBitQ method and its MRQ extension—detailing their compression ratios, speed gains, compatibility with search indexes, and real‑world performance results in the VSAG system.
1. Introduction
Vector retrieval is a core technology in modern AI applications such as RAG, search engines, and recommendation systems. With the rise of deep learning, high‑dimensional vectors become essential for capturing semantic information, but OpenAI’s latest embedding models embed texts into 1536‑ or even 3072‑dimensional vectors, creating huge challenges for retrieval.
Challenges
Distance computationaccounts for over 90% of the cost in high‑dimensional retrieval, and memory consumption is also prohibitive: storing 1 million 1536‑dimensional 32‑bit vectors already requires about 9.55 GB.
2. Quantization
Quantizationmaps high‑precision vectors to low‑precision representations, offering two main benefits:
Speed: compressing 32‑bit floats to 4‑ or 8‑bit integers reduces distance‑computation overhead, often doubling speed.
Memory: a 4‑bit scalar quantization uses only 1/8 of the original storage.
2.1 RaBitQ – Binary Quantization
RaBitQ, presented at SIGMOD 2024, is a binary quantization method that compresses each dimension from 32‑bit float to 1‑bit, achieving a 32× reduction in memory. Compared with PQ and SQ, RaBitQ offers:
Extreme compression ratio (32×).
High accuracy comparable to lower‑compression methods.
Compatibility with graph‑based and inverted‑index search.
2.2 Core Techniques
(a) Pre‑processing : random rotation, residual quantization, and normalization to reduce variance.
(b) Unbiased estimator : restores distances from binary codes using a learned codebook and unbiased estimation.
(c) Bitwise distance computation : leverages AVX‑512 to compute distances with bitwise operations, achieving up to 8× throughput.
3. MRQ – Minimized Residual Quantization
MRQ extends RaBitQ by projecting vectors onto a lower‑dimensional subspace (via PCA) before binary quantization, allowing flexible compression ratios (e.g., 1/3 or 1/7 of the bits) while preserving accuracy.
Experimental results on GIST and DEEP datasets show MRQ‑IVF achieves up to 3× efficiency gain with comparable recall.
4. Integration in VSAG
Example configuration for enabling RaBitQ in VSAG is shown, along with performance tables comparing fp32, sq8, RaBitQ, and MRQ variants on the GIST‑1M dataset. RaBitQ reduces memory to 0.11 GB (1/32) and increases QPS from 528 to 779.
5. Conclusion
RaBitQ provides a high‑compression, high‑performance solution for vector search, and when combined with PCA‑based MRQ it can further reduce memory by up to 64× while maintaining recall and accelerating queries.
References and dataset links are listed at the end.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AntData
Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
