Understanding Milvus Vector Indexes: Structures, Quantization, and Future Trends
This article explains the core concepts of vector database indexing, details the composition of Milvus indexes—including data structures, quantization methods, and specific algorithms like IVF, HNSW, DISKANN, PQ, RABITQ, PRQ, SCANN, AISAQ, and MINHASH_LSH—and offers speculation on future developments.
Concept
Indexes are auxiliary data structures that trade storage space for query speed. In vector databases such as Milvus they enable fast approximate nearest‑neighbor (ANN) search by limiting the amount of data examined during a query.
Milvus Index Composition
Each Milvus index consists of three parts:
Data Structure – how vectors are organized (e.g., IVF, HNSW, DISKANN).
Quantization – how vectors are compressed (e.g., FLAT, SQ8, PQ, RABITQ, PRQ, SCANN).
Refiner – optional post‑processing that improves recall.
Data‑Structure Indexes
IVF (Inverted File)
Vectors are clustered with K‑means into nlist regions. Each region stores a centroid; at query time the nprobe nearest centroids are selected and only vectors inside those regions are examined.
Key parameters: nlist – number of clusters (default 128, range 1‑65536). nprobe – number of clusters probed per query (default 8). Larger nprobe improves recall at the cost of more distance calculations.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi‑layer proximity graph. Upper layers provide coarse navigation; the bottom layer contains all vectors for fine‑grained search.
Important parameters: M – maximum number of outgoing edges per node (graph density). efConstruction – number of candidate neighbors examined during index construction (higher → better quality, longer build time). ef – number of candidates examined during search (higher → higher recall, slower query).
DISKANN (Vamana Graph)
A disk‑resident graph where each node initially has ~500 random edges. The graph is pruned and long‑range shortcuts are added to accelerate ANN search.
Key parameters: max_degree – maximum edges per node (graph density). search_list_size – breadth of the refinement step during construction. beam_width_ratio – controls the breadth of the search beam:
beam_width = num_cpu_cores * beam_width_ratio search_cache_budget_gb_ratio– fraction of RAM reserved for caching frequently accessed PQ blocks.
Quantization Techniques
FLAT
No compression; vectors are stored as raw floating‑point numbers. Guarantees 100 % recall but requires the most memory and CPU.
SQ8 (Scalar Quantization, 8‑bit)
Each dimension is normalized to [0,1] and then scaled to an 8‑bit integer.
normalized_value = (value - min) / (max - min)
compressed = round(normalized_value * 255)This reduces memory by a factor of four and balances the influence of each dimension.
Product Quantization (PQ)
High‑dimensional vectors are split into m sub‑vectors of equal size. For each sub‑space a K‑means codebook (size 2^{nbits}) is learned. Each sub‑vector is replaced by the index of its nearest centroid, yielding a compact code of m * nbits bits.
Distance computation uses pre‑computed lookup tables:
# Query preprocessing
sub_vectors = split(query, m)
for each sub_vector:
compute distances to all centroids → lookup table
# Approximate distance
approx_dist = sum(lookup_table_i[code_i] for i in range(m))Parameters: m – number of sub‑vectors. nbits – bits per sub‑vector (e.g., 8 bits → 256 centroids).
RABITQ (2024 SIGMOD)
RABITQ normalizes vectors onto the unit hypersphere, builds a tiny binary codebook, applies a random orthogonal matrix P to remove directional bias, and finally stores only the sign of each dimension (binary). The algorithm exploits high‑dimensional concentration: after random rotation most coordinates concentrate near zero, allowing accurate inner‑product estimation from binary signs.
# Index build
# 1. Centering & normalization
o = (data - c) / ||data - c||
# 2. Random orthogonal rotation
o_rot = P @ o
# 3. Binarize
code = sign(o_rot) # +1 → 1, -1 → 0
# Query
q_rot = P @ ((query - c) / ||query - c||)
# Approx inner product
approx_ip = (code.T @ sign(q_rot)) / DRABITQ achieves near‑FLAT recall with binary storage and is hardware‑aware (optimised for Intel IceLake / AMD Zen 4).
PRQ (Product Residual Quantization)
PRQ extends PQ by quantizing the residual vector after the first PQ step. The residual is encoded with a second codebook, allowing finer approximation without a large storage increase. Parameter nrq controls how many residual quantization rounds are performed.
SCANN (Anisotropic Vector Quantization)
SCANN, from Google, tailors quantization for Maximum Inner Product Search (MIPS). Instead of minimizing Euclidean distance, it preserves components that contribute most to the inner product, yielding higher MIPS accuracy for the same code size.
Other Index Types
AISAQ (Disk‑Based Extension of DISKANN)
Two deployment modes:
performance – stores PQ data redundantly on disk to keep I/O low (larger index size).
scale – stores PQ data without redundancy, minimizing disk usage at the cost of higher I/O.
Optimisations include PQ re‑ordering and an in‑memory PQ cache ( pq_cache_size).
MINHASH + LSH (Binary Vector Index)
Designed for large‑scale deduplication and similarity search (e.g., LLM training data). The pipeline:
Shingling – convert a document into overlapping token n‑grams.
Hash – apply multiple independent hash functions to each shingle.
Min‑selection – keep the minimum hash value per function, forming a MinHash signature.
Banding – split the signature into b bands of r hashes each.
LSH – hash each band; documents that collide in at least one band become candidate pairs.
Jaccard similarity is approximated by the fraction of matching MinHash values:
J(A,B) ≈ |signature_A ∩ signature_B| / |signature_A ∪ signature_B|.
Practical Tuning Guidance
Choose a data‑structure index based on dataset size and latency requirements: IVF for large static collections, HNSW for low‑latency interactive search, DISKANN/AISAQ when the dataset exceeds RAM.
Adjust nlist and nprobe (IVF) or ef (HNSW) to trade recall for speed.
Select quantization according to memory budget: FLAT for maximum accuracy, SQ8 for modest compression, PQ for strong compression with controllable recall, RABITQ or PRQ when binary storage or residual refinement is needed.
For disk‑resident indexes, tune max_degree, search_list_size, and cache ratios to balance build time, query latency, and I/O.
Future Outlook
As vector databases mature, the variety of index algorithms is expected to converge toward a few universally effective structures. Systems may automatically select or even learn the optimal combination of data‑structure and quantization based on workload characteristics, reducing the need for manual tuning.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
