How Vector Retrieval Powers AI: Challenges, Solutions, and VSAG’s Open‑Source Breakthrough
The article examines the rapid growth of unstructured data, explains the fundamentals and resource‑intensive nature of vector retrieval, presents Ant Group’s engineering practices—including hybrid HNSW‑DiskANN indexing, performance tricks like BSA pruning and memory prefetching, sparse‑vector and feedback‑driven recall improvements—and outlines the open‑source VSAG roadmap and ecosystem integrations.
Foundations and Technical Challenges of Vector Retrieval
Vector databases transform unstructured media (audio, video, images, text) into high‑dimensional vectors using deep neural networks and index them with structures such as nearest‑neighbor graphs or inverted files. A single query may require hundreds of thousands of floating‑point distance calculations, making vector search far more compute‑intensive than traditional key‑value lookups. Memory consumption is also extreme; for example, storing 100 M vectors of 1024 dimensions needs >1 TB of RAM.
Key challenges:
High CPU, memory and I/O usage.
Rapid cost escalation for large‑scale deployments.
Trade‑off between recall accuracy, latency and throughput.
Ant Group Engineering Practices and Case Studies
Ant Group built a custom vector database on top of a mixed row‑column KV storage platform, adding a dedicated vector index layer and a retrieval‑proxy service. To balance cost, accuracy and performance, they adopted a hybrid indexing scheme that combines:
HNSW – high recall, low latency but memory‑heavy.
DiskANN – disk‑based, low‑cost index.
New vectors are first inserted into an in‑memory HNSW index; a background compactor periodically batches these updates into a DiskANN index, replacing most of the HNSW structure. This reduces memory usage to roughly one‑tenth of a pure HNSW solution while keeping QPS and latency comparable, cutting total cost of ownership to about 1/7 of the original.
Additional optimizations:
BSA pruning & memory layout : a two‑stage retrieval uses a reduced‑dimension vector for graph traversal, then re‑ranks candidates with full‑precision vectors. A linear classifier predicts whether the second stage is needed, saving compute.
Memory placement & prefetching : custom data layout and prefetch logic improve cache‑hit rates, boosting throughput by ~25% (layout) and an additional ~20% (prefetch).
Accuracy Enhancements
To push recall toward 100%, Ant Group introduced a conjugate‑graph mechanism that augments the nearest‑neighbor graph with nodes missed due to local‑optimum traps, driven by user feedback. Experiments raised recall from 99.8% to 99.97%.
Other techniques:
Binary quantization (RabitQ) : based on a SIGMOD‑2025 paper, achieves up to 32× compression with negligible accuracy loss; integrated into VSAG.
PAG (Partitioned Aggregation Graph) : a hybrid graph‑clustering index that reduces disk I/O by ~22% for large disk‑based indexes.
HGraph hierarchical framework : enables flexible composition of graph and inverted indexes across layers.
VSAG Open‑Source Library
VSAG is Ant Group’s open‑source vector index library that implements the above innovations. It supports mixed memory‑disk indexing, sparse‑vector search, and provides bindings for Python (PyVSAG), SQLite, Valkey/Redis, OceanBase and Greptime.
Performance highlights (July 2024 release, https://github.com/antgroup/vsag): on the ANN‑Benchmarks GIST‑960 dataset VSAG delivers up to 3× higher QPS than HNSWLib at 90% recall, achieving state‑of‑the‑art results.
Roadmap:
v0.15 (Apr 2025): sparse‑vector search, pluggable quantization framework.
v0.16 (May 2025): ARM‑Neon optimizations, GPU‑accelerated index building, automatic tuner for index parameters.
v0.17 (Jun 2025): Intel‑AMX support, attribute storage in vectors, graph compression.
These capabilities aim to provide a scalable, cost‑effective backbone for AI workloads such as Retrieval‑Augmented Generation, recommendation systems and large‑language‑model augmentation.
AntData
Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
