How RaBitQ Achieves 32× Vector Compression Without Sacrificing Accuracy

This article explains the challenges of high‑dimensional vector retrieval, introduces quantization techniques—especially the binary RaBitQ method and its MRQ extension—detailing their compression ratios, speed gains, compatibility with search indexes, and real‑world performance results in the VSAG system.

AntData
AntData
AntData
How RaBitQ Achieves 32× Vector Compression Without Sacrificing Accuracy

1. Introduction

Vector retrieval is a core technology in modern AI applications such as RAG, search engines, and recommendation systems. With the rise of deep learning, high‑dimensional vectors become essential for capturing semantic information, but OpenAI’s latest embedding models embed texts into 1536‑ or even 3072‑dimensional vectors, creating huge challenges for retrieval.

Challenges

Distance computation

accounts for over 90% of the cost in high‑dimensional retrieval, and memory consumption is also prohibitive: storing 1 million 1536‑dimensional 32‑bit vectors already requires about 9.55 GB.

2. Quantization

Quantization

maps high‑precision vectors to low‑precision representations, offering two main benefits:

Speed: compressing 32‑bit floats to 4‑ or 8‑bit integers reduces distance‑computation overhead, often doubling speed.

Memory: a 4‑bit scalar quantization uses only 1/8 of the original storage.

2.1 RaBitQ – Binary Quantization

RaBitQ, presented at SIGMOD 2024, is a binary quantization method that compresses each dimension from 32‑bit float to 1‑bit, achieving a 32× reduction in memory. Compared with PQ and SQ, RaBitQ offers:

Extreme compression ratio (32×).

High accuracy comparable to lower‑compression methods.

Compatibility with graph‑based and inverted‑index search.

2.2 Core Techniques

(a) Pre‑processing : random rotation, residual quantization, and normalization to reduce variance.

(b) Unbiased estimator : restores distances from binary codes using a learned codebook and unbiased estimation.

(c) Bitwise distance computation : leverages AVX‑512 to compute distances with bitwise operations, achieving up to 8× throughput.

3. MRQ – Minimized Residual Quantization

MRQ extends RaBitQ by projecting vectors onto a lower‑dimensional subspace (via PCA) before binary quantization, allowing flexible compression ratios (e.g., 1/3 or 1/7 of the bits) while preserving accuracy.

Experimental results on GIST and DEEP datasets show MRQ‑IVF achieves up to 3× efficiency gain with comparable recall.

4. Integration in VSAG

Example configuration for enabling RaBitQ in VSAG is shown, along with performance tables comparing fp32, sq8, RaBitQ, and MRQ variants on the GIST‑1M dataset. RaBitQ reduces memory to 0.11 GB (1/32) and increases QPS from 528 to 779.

5. Conclusion

RaBitQ provides a high‑compression, high‑performance solution for vector search, and when combined with PCA‑based MRQ it can further reduce memory by up to 64× while maintaining recall and accelerating queries.

References and dataset links are listed at the end.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory OptimizationAI embeddingsbinary quantizationhigh-dimensional retrievalMRQRaBitQvector quantization
AntData
Written by

AntData

Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.