Databases 26 min read

Inside Milvus’ Index Engine: 3‑Layer Parameter Filling, Compile‑time Hardware Split, and a 16× Memory Trade‑off

The article dissects Milvus’ index engine, revealing that AUTOINDEX relies on a three‑stage default‑parameter pipeline, that CPU/GPU index selection is fixed at compile time via Go build tags, that the C++ Knowhere engine executes the algorithms, and that version aggregation, scalar V3 format, and the new AISAQ index embody deliberate memory‑vs‑IO trade‑offs.

Shuge Unlimited
Shuge Unlimited
Shuge Unlimited
Inside Milvus’ Index Engine: 3‑Layer Parameter Filling, Compile‑time Hardware Split, and a 16× Memory Trade‑off

AUTOINDEX – a three‑layer default‑parameter filling pipeline

The autoIndex.enable flag triggers a staged process that ensures the IndexNode receives a complete parameter set without altering the underlying index algorithm.

First layer – proxy splits by data type

Entry point: internal/proxy/task_index.gocreateIndexTask.PreExecute. When enabled or when the user omits index_type, the system looks up defaults based on the field type:

DenseFloat vectors → autoIndex.params.buildHNSW_SQ (CPU) or GPU_CAGRA (GPU)

Int8 vectors → autoIndex.params.int8.buildHNSW + M:18 + efConstruction:240 Sparse vectors → autoIndex.params.sparse.buildSPARSE_INVERTED_INDEX + IP Binary vectors → autoIndex.params.binary.buildBIN_IVF_FLAT + HAMMING Deduplicate vectors → autoIndex.params.deduplicate.buildMINHASH_LSH + MHJACCARD Large TopK scenarios → autoIndex.params.largeTopK.buildIVF_SQ8 The function adjustAutoIndexParamsByDataType (lines 85‑120 of autoindex_param.go) forces FP16 refinement for fp16 data and BF16 refinement for bf16 data, while never overriding the user‑provided metric_type.

Second layer – Knowhere supplements missing parameters

Core logic resides in internal/datacoord/index_inspector.gocreateIndexForSegment (lines 237‑248) and indexBuildTask.prepareJobRequest. The guard:

if isVectorIndex && Params.KnowhereConfig.Enable.GetAsBool() {
    indexParams, err = Params.KnowhereConfig.UpdateIndexParams(
        indexType, paramtable.BuildStage, indexParams)
}
KnowhereConfig.UpdateIndexParams

(see knowhere_param.go lines 109‑144) performs:

Read default parameters for the given index type, e.g. DISKANN.build.max_degree.

Fill only those keys that are absent ( if GetKeyFromSlice(indexParams, key) == "").

Honor override_index_type to replace the target type’s parameters when the user explicitly requests it.

Thus a user can specify index_type: DISKANN and the system automatically adds max_degree: 56, pq_code_budget_gb_ratio: 0.125, search_list_size: 100, etc., while still allowing explicit overrides.

The configs/milvus.yaml file defines parameters only for AISAQ and DISKANN; HNSW‑related defaults are hard‑coded in autoindex_param*.go and bypass the Knowhere layer.

Third layer – special indexes add runtime parameters

Disk‑based indexes such as DISKANN invoke indexparams.UpdateDiskIndexBuildParams (in task_index.go lines 247‑253) which injects: build_dram_budget_gb: the currently available memory. vec_field_size_gb: the actual size of the vector field.

These values let the algorithm adapt to the real resource situation during build.

The three layers together form a “stage‑wise default‑parameter filling pipeline” where each stage only fills missing values, preserving any user‑provided settings.

Three‑layer filling pipeline illustration
Three‑layer filling pipeline illustration

CPU/GPU split is decided at compile time via Go build tags

Two files encode the default float‑vector index: autoindex_param_nocuda.go (build tag !cuda) → default HNSW_SQ + SQ4U + refine:true + refine_type:FP16. autoindex_param_cuda.go (build tag cuda) → default GPU_CAGRA.

When Milvus is built with the cuda tag (the GPU image), the GPU file is compiled; otherwise the CPU file is used. No runtime hardware detection occurs.

Design rationale:

CPU‑only deployments are memory‑bound, so HNSW_SQ (graph index with 4‑bit scalar quantization and FP16 refinement) reduces RAM usage.

GPU deployments are bandwidth‑ and parallelism‑bound, so the native GPU_CAGRA index avoids quantization and fully utilizes GPU memory.

The compile‑time split yields a smaller binary and predictable behaviour, at the cost that a single binary cannot simultaneously support both CPU and GPU modes.

Knowhere – the C++ execution engine bridged via CGO

Knowhere is the vector search execution engine of Milvus. It encapsulates many popular vector index algorithm libraries, such as faiss, hnswlib, NGT, annoy, and provides a set of unified interfaces.

Architecture:

Go control plane (DataCoord/IndexNode): task scheduling, metadata, version negotiation, parameter filling, storage I/O.

C++ execution plane (Knowhere): actual index construction, query, serialization.

CGO bridges the two; Go calls C++ functions and receives results.

Bit‑flag encoding of index features

Go obtains a set of feature flags via C.GetIndexFeatures() (in internal/util/vecindexmgr/vector_index_mgr.go). Relevant flags: NOTrainFlag – no training required (e.g., FLAT). KNNFlag – exact search (100 % recall). GpuFlag – requires GPU. MmapFlag – supports mmap. MvFlag – supports materialized view. DiskFlag – requires disk.

Helpers such as IsGPUVecIndex and IsMMapSupported use these flags to decide resource allocation.

Hard‑coded resource ratios

In index_attr_cache.go the memory‑to‑disk ratios are constants:

DISKANN: UsedDiskMemoryRatio = 4 (memory ≈ index size / 4).

AISAQ: UsedDiskMemoryRatioAisaq = 64 (memory ≈ index size / 64).

INVERTED: memory = 0, everything is mmap‑ed.

AISAQ therefore uses 16× less RAM than DISKANN, a value baked into the source rather than measured at runtime.

Version management – full MIN/MAX aggregation for safe rolling upgrades

Three version constants (from 20260313-scalar_index_version_management.md) are used: MinimalIndexVersion – lowest version that can build + search. CurrentIndexVersion – hard‑coded default build version. MaximumIndexVersion – highest non‑beta version that can build + search.

Each QueryNode registers its supported range in etcd. DataCoord’s IndexEngineVersionManager aggregates them: GetCurrent*Version() – MIN of all QueryNodes’ Current → highest version that every node can load (used for building). GetMinimal*Version() – MAX of all QueryNodes’ Minimal → lowest version any node requires (compatibility check). GetMaximum*Version() – MIN of all QueryNodes’ Maximum → highest version that every node can handle (clamping).

This combination guarantees that during a rolling upgrade newly built indexes are always readable by the oldest nodes, ensuring zero‑downtime at the expense of temporarily using a sub‑optimal index format.

Scalar indexes adopt the same mechanism (MEP March 2026) with TargetScalarIndexVersion and ForceRebuildScalarSegmentIndex fields.

Scalar V3 format – general‑purpose KV layout vs. columnar optimization

The file layout is:

[Magic "MVSIDXV3"] [Data Region] [Directory Table JSON] [Footer 32B]

The 32‑byte footer points to the directory table, allowing the entire metadata to be fetched with 1‑2 I/O operations – a cost‑effective design for remote storage (e.g., S3).

Instead of adopting Lance’s columnar format (which forces every graph, tree, or hash structure to be flattened into columns, incurring encoding overhead), V3 uses a pure KV style (key → serialized blob + metadata). This sacrifices columnar read‑optimisation for maximal compatibility with heterogeneous index types (BITMAP, STL_SORT, INVERTED, etc.). Since scalar indexes are primarily for filtering and point look‑ups, the trade‑off is justified.

Encryption boundary – slice vs. entry

When encryption is enabled, the boundary is at the 16 MiB slice level; without encryption, entries are written as contiguous plaintext. Consequently there are two write paths:

Unencrypted → IndexEntryDirectStreamWriter: streams directly to remote storage (e.g., S3) with parallel upload.

Encrypted → IndexEntryEncryptedLocalWriter: writes to a temporary local file first (ciphertext size is unpredictable) and then uploads.

This extra local write is a deliberate trade‑off to support encryption.

Version routing – no magic‑based detection

Version selection is performed by the caller rather than by reading the file magic:

Build side checks the SCALAR_INDEX_ENGINE_VERSION config.

Load side uses SealedIndexTranslator to read the same config.

The Go control plane uses the constant CurrentScalarIndexEngineVersion.

This decouples defensive validation from version routing, ensuring mismatches are caught early during build.

AISAQ – the new DISKANN‑based coordinate on the memory‑IO‑disk triangle

AISAQ (added June 2026, Knowhere v3.0.4) extends DISKANN with a “Near‑Zero DRAM” design. The three modes are:

DISKANN : PQ data in DRAM, single‑node I/O = 1 × (raw vector + edgelist), medium disk footprint.

AISAQ‑Performance : PQ data on‑disk (inline, redundant neighbor PQ), I/O = 1 × (raw + edgelist + neighbor PQ), large (redundant) footprint.

AISAQ‑Scale : PQ data on‑disk (separate, rearranged), I/O > 1 × (multiple reads + PQ cache mitigation), small footprint.

DISKANN’s UsedDiskMemoryRatio = 4 versus AISAQ’s UsedDiskMemoryRatio = 64 reflects a 16× memory reduction. For billions of vectors, DISKANN may need tens of gigabytes of RAM, while AISAQ can run with 1‑2 GB by trading disk I/O for DRAM.

Memory‑IO‑Disk trade‑off triangle for DISKANN and AISAQ
Memory‑IO‑Disk trade‑off triangle for DISKANN and AISAQ

"Index" in Milvus denotes three independent technology stacks

Vector index : graph (HNSW series), clustering (IVF series), quantization (SQ/PQ/RABITQ) – wrapped by Knowhere.

Scalar index : sorting (STL_SORT), bitmap (BITMAP), inverted (INVERTED based on Tantivy). AUTOINDEX routes by data type: int/varchar/float/jsonHYBRID (bitmap if cardinality ≤ 100, otherwise sort). boolBITMAP. geometryRTREE.

Primary‑key index : implemented with BBhash (minimal perfect hash) + value array. Performance numbers from 20250429-primarykey_index.md show ~200 ns per PK lookup (10‑20 M QPS per node) versus ~0.1 ms per Bloom filter (≈ 1 K QPS).

Boundaries of the three index technology stacks
Boundaries of the three index technology stacks

Key takeaways

AUTOINDEX is a three‑layer parameter‑filling pipeline, not an intelligent selector.

CPU/GPU index selection is fixed at compile time via Go build tags ( HNSW_SQ vs. GPU_CAGRA).

Knowhere provides the C++ execution engine; Go bridges to it with CGO and bit‑flag feature encoding.

Version management aggregates MIN/MAX across the cluster, favoring availability and enabling zero‑downtime rolling upgrades.

Scalar V3 format chooses a general‑purpose KV layout over columnar optimization for broader index compatibility.

AISAQ introduces a new point on the memory‑IO‑disk trade‑off triangle, dramatically reducing DRAM usage for massive datasets.

Milvus “index” actually comprises three independent stacks – vector, scalar, and primary‑key – each with its own algorithms, formats, and evolution path.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

vector databaseMilvusversion managementAISAQAUTOINDEXCPU/GPU build tagsKnowherescalar V3 format
Shuge Unlimited
Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.