How DiskANN + RaBitQ Supercharges Milvus: 5× Faster, 90% Cheaper Vector Search
This article explains how integrating the disk‑based DiskANN index with the ultra‑compact RaBitQ quantization dramatically boosts Milvus's vector search performance and cuts costs, delivering over five times higher QPS and more than 90% cost reduction for billion‑scale AI workloads.
Overview
Vector retrieval is entering an era that demands both high recall, low latency, and scalability to billions of vectors with controllable cost. In Volcano Cloud Search we introduced DiskANN, a disk‑based vector index that stores vectors on disk while keeping only graph files in memory, reducing vector search cost by over 90%.
DiskANN also supports pure‑memory performance mode; combined with the SOTA RaBitQ quantization algorithm, memory usage is reduced by 85%, delivering extreme cost‑performance for high‑throughput vector search, now deployed at scale.
Integration of DiskANN + RaBitQ into Volcano Milvus
We integrated DiskANN + RaBitQ into the Volcano Engine vector database Milvus edition ("Volcano Milvus"). It retains Milvus’s rich open‑source ecosystem while delivering higher QPS and lower cost for billion‑scale data; disk‑based index QPS is more than 5× that of the community edition.
Volcano Milvus vs Open‑source Milvus
Conclusion: For both capacity‑oriented and performance‑oriented indexes, Volcano Milvus outperforms the open‑source version in performance and cost.
Capacity‑oriented Index
Performance is >5× the open‑source version
Per‑QPS price is 80% lower despite identical specifications and monthly price.
Performance‑oriented Index
QPS is 30% higher
Query node needs only 48 GB memory vs 192 GB , resulting in lower monthly price.
Per‑QPS price is 30% lower
What is DiskANN?
DiskANN, from the paper “Fast Accurate Billion‑point Nearest Neighbor Search on a Single Node”, moves the heavy graph index and raw vectors to SSD, keeping only compressed vectors and hot nodes in memory. It uses multi‑way disk reads and batch neighbor fetching to drastically reduce random reads.
Key designs:
Vamana graph : two‑round pruning yields a smaller diameter and more long edges, reducing hop count and latency.
Implicit re‑ranking : stores neighbors and full‑precision vectors in the same disk sector; during disk reads, both are fetched, enabling precise top‑K re‑ranking.
What is RaBitQ?
RaBitQ (Random Bit Quantization) is a state‑of‑the‑art vector quantization that maps each dimension to 1 bit on a unit sphere, achieving up to 32× compression for full‑precision vectors.
In high dimensions (e.g., 1024‑D) the quantization error becomes negligible, and distance computation can be performed with bit‑wise operations using AVX‑512, reducing a 1024‑D vector distance to two VPOPCNT instructions.
Memory usage reduced : 32× compression.
Computation faster : bit‑wise distance calculation.
DiskANN + RaBitQ in Practice
Disk mode : RaBitQ‑quantized vectors reside in memory; raw data and neighbors stay on disk. Search first uses quantized vectors for fast pruning, then refines with exact vectors from disk.
Performance mode : For memory‑rich, latency‑critical scenarios, both neighbors and full‑precision vectors are loaded into memory, eliminating repeated I/O.
How Volcano Milvus Uses DiskANN
Build: Index Node loads raw data from object storage, DiskANN builds the index and writes files to local storage, then uploads them back.
Load: Query Node downloads index files and loads selected data into memory based on mode.
Search: Query Node executes the DiskANN search algorithm.
Volcano Milvus Public Beta Announcement
Volcano Engine’s Milvus 10.22 public beta launches, offering cloud‑native, managed vector search for AI applications such as RAG, image retrieval, and recommendation, with multi‑region coverage, built‑in ops support, and dedicated technical onboarding.
Apply via the Volcano Engine website’s pre‑sales channel.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
