How GPU‑Accelerated NN‑Descent Boosts Vector Search Speed by Up to 13×
This article explains how unstructured multimedia data is transformed into vectors for similarity search, introduces GPU parallelism and the NN‑Descent algorithm to replace traditional HNSW indexing in OpenSearch, and presents benchmark results showing up to a thirteen‑fold speed improvement while maintaining comparable recall.
In the digital era, unstructured multimedia data grows explosively, requiring vector retrieval techniques. Vectors are stored in vector databases and enable similarity search.
Example: embedding(king) - embedding(man) + embedding(woman) ≈ embedding(queen).
We introduced GPU parallel computation and the NN‑Descent indexing algorithm into OpenSearch, replacing the traditional HNSW algorithm. This GPU‑based method leverages massive parallelism to accelerate index construction.
GPU Basics
Originally designed for graphics, GPUs now power deep learning and high‑performance computing. Modern Nvidia GPUs (e.g., RTX 4090, H100) provide thousands of cores and Tensor Cores for fast matrix multiplication.
Benchmark on a small dataset (1 M × 960‑dim GIST) shows GPU construction time 27 s vs 587 s on a 32‑core CPU (≈21.7× faster). On a large dataset (113 M × 1024‑dim Cohere) GPU reduces build time from ~13 h to ~1 h (≈13× faster) while maintaining comparable recall after adjusting query parameters.
NN‑Descent Algorithm
NN‑Descent builds a single‑layer proximity graph by iteratively refining neighbor lists using second‑order neighbors. Distance computation (e.g., Euclidean) is transformed into matrix multiplication, which GPUs execute efficiently.
Integration with HNSW
We evaluated three integration strategies: (1) use the single‑layer NN‑Descent graph directly, (2) build additional upper layers with the HNSW procedure, (3) build upper layers with NN‑Descent. The third approach gave the best balance of speed and compatibility with existing HNSW query logic.
Performance Results
GPU NN‑Descent is ~20× faster than a 32‑core CPU for index construction.
Overall system speedup after product integration is ~10×.
Recall is slightly lower than CPU at identical query settings, but can be matched by increasing ef.
GPU resources are largely freed for other tasks, improving overall utilization.
Conclusion
GPU‑accelerated NN‑Descent dramatically speeds up vector index construction, offering up to 13× theoretical cost‑performance improvement. Practical gains are around 5× due to overhead, but further scheduling optimizations are planned to increase GPU utilization and recall.
References
Initial CUDA Performance Surprises.
NN‑Descent.
GPU NN‑Descent.
Further Reading
https://www.aliyun.com/activity/bigdata/opensearch/platform
https://www.aliyun.com/activity/bigdata/opensearch/llmsearch
https://www.aliyun.com/activity/bigdata/opensearch/vector
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
