Artificial Intelligence 7 min read

How Alibaba Cloud Milvus Achieves 20× Faster Billion‑Scale Vector Search with DiskANN and RaBitQ

Alibaba Cloud Milvus combines DiskANN graph indexing with the RaBitQ quantization algorithm, delivering over 20× higher QPS, sub‑10% P99 latency, 29% lower memory usage and more than 98% recall on a 100 million‑vector, 768‑dimensional benchmark, while also cutting index build time from 20 h to about 6 h.

Alibaba Cloud Big Data AI Platform

May 29, 2026

How Alibaba Cloud Milvus Achieves 20× Faster Billion‑Scale Vector Search with DiskANN and RaBitQ

Why Disk Vector Index?

Large‑scale AI applications often involve billions of vectors, and traditional in‑memory indexes such as HNSW or IVF_FLAT cause memory costs to grow linearly, making a single node infeasible. DiskANN stores the graph structure and raw vectors on SSD, keeping only compressed vectors and hot caches in memory, thus reducing memory cost by an order of magnitude.

Performance Bottlenecks of Open‑Source DiskANN

The native DiskANN in open‑source Milvus uses Product Quantization (PQ) for distance estimation, leading to three main issues:

Computation efficiency: PQ relies on table look‑ups and accumulations, resulting in low CPU efficiency.

I/O scheduling: System‑call overhead limits concurrent throughput.

Search strategy: Many candidate nodes are evaluated uselessly, increasing CPU pressure.

Alibaba Cloud Milvus addresses these bottlenecks with full‑stack optimizations from algorithm to I/O scheduling.

Core Techniques: DiskANN + RaBitQ Deep Fusion

Vamana Graph – Memory Re‑layout

DiskANN’s core is the Vamana graph, a single‑layer sparse graph unlike HNSW’s multi‑layer design. Alibaba Cloud Milvus applies a two‑round pruning strategy that preserves graph connectivity while adding longer edges, reducing the number of hops needed for convergence. It also reorganizes the Vamana graph in memory so that during search the system performs a “Zero‑IO” traversal, fetching raw vectors from disk only in the final re‑rank stage.

RaBitQ Quantization – 1‑bit to 4‑bit Precise Compression

RaBitQ (Random Bit Quantization) maps normalized high‑dimensional vectors to vertices of a hyper‑cube, using just 1 bit per dimension. In high dimensions the “concentration of measure” effect makes the quantization error shrink as O(1/√d), so 1‑bit quantization is already highly accurate for 768‑dimensional data. Alibaba Cloud Milvus extends the basic 1‑bit scheme with a 4‑bit residual encoding, achieving a good trade‑off between compression ratio and accuracy.

The following quantization options are compared:

Float32: 1× compression, 3072 Byte per vector, exact accuracy.

PQ (M=384): 8× compression, 384 Byte, medium accuracy, slower table‑lookup computation.

RaBitQ 1‑bit: 32× compression, 96 Byte, high accuracy, extremely fast popcount‑based computation.

RaBitQ 4‑bit: 8× compression, 384 Byte, high accuracy, ultra‑fast AVX‑512 VNNI execution.

Benchmark Setup

Tests were run with Zilliz VectorDBBench on the Performance768D100M dataset (100 M vectors, 768 dimensions). The QueryNode configuration was 16 CU × 2 nodes. Two groups were compared:

Alibaba Cloud DiskANN + RaBitQ

Open‑source DiskANN + PQ

Results

Across all test scenarios, Alibaba Cloud Milvus achieved more than 20× QPS improvement, while P99 and P95 latencies dropped to less than one‑tenth of the baseline. Recall remained above 98%, a drop of less than 1%. Index construction time shrank from roughly 20 hours to about 6 hours, demonstrating end‑to‑end performance gains.

Figures below illustrate the QPS, latency, and memory usage curves.

Conclusion

In diverse test scenarios the Alibaba Cloud Milvus solution consistently delivers over 20× QPS gains, dramatically lower tail latency, modest memory savings, and negligible recall loss, while reducing index build time from 20 h to roughly 6 h, establishing a new performance frontier for billion‑scale vector retrieval.

References

Subramanya, S.J., et al. "DiskANN: Fast Accurate Billion‑point Nearest Neighbor Search on a Single Node." NeurIPS 2019.

Gao, J., Long, C. "RaBitQ: Quantizing High‑Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search." SIGMOD 2024.

Aguerrebere, C., et al. "Locally‑adaptive Quantization for Streaming Vector Search." arXiv 2024.

Gao, J. "Quantization in The Counterintuitive High‑Dimensional Space." dev.to, 2024.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Quantization Milvus Vector Search RaBitQ DiskANN

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.