Databases 25 min read

How StarRocks Supercharges Vector Search: 7× Faster Queries and 1/3 Cost

This article explains the principles and practical implementation of vector retrieval in StarRocks, covering approximate nearest‑neighbor algorithms, index design, query planning, performance optimizations, real‑world case studies, and future challenges, showing how query latency dropped from 15 seconds to 2 seconds while cutting costs to a third.

StarRocks

Feb 11, 2025

Overview of Vector Retrieval

Vector retrieval finds the k nearest vectors to a query vector in a high‑dimensional feature database, essentially a Top‑N search where every data item is represented as a vector. Because deep‑learning models encode almost all data as vectors, this operation becomes fundamental for AI‑driven applications.

Why Approximate Nearest Neighbor (ANN) is Needed

Exact Top‑N search is infeasible for high‑dimensional data due to the "curse of dimensionality" and the massive number of floating‑point operations required. ANN techniques such as HNSW (Hierarchical Navigable Small World) and IVFPQ (Inverted File with Product Quantization) provide fast, sub‑optimal results that meet millisecond‑level latency requirements in Retrieval‑Augmented Generation (RAG) and other large‑model scenarios.

Main ANN Algorithms

HNSW : builds a layered proximity graph; queries traverse the graph using a skip‑list‑like structure for efficient pruning.

IVFPQ : clusters vectors, stores only the cluster IDs, and compresses vectors with product quantization; a coarse search finds candidate clusters, followed by a finer re‑ranking.

Index Design in StarRocks

Two index granularities were evaluated:

Segment‑level index : integrates with StarRocks' existing bitmap index, works for both primary‑key and detail tables, and avoids extra UID mapping.

Tablet‑level index : similar to a primary‑key index but requires additional UID remapping, increasing complexity and latency.

The segment‑level approach was chosen because its rebuild cost is lower and it aligns with StarRocks' MPP architecture.

Query Planning and Syntax

Vector search conflicts with traditional SQL semantics. The engine must decide whether to apply TOP N before or after filter predicates. Early filtering reduces candidate size but may violate the user’s LIMIT 10 intent; late filtering improves performance but can affect recall. StarRocks ultimately keeps the simple TOP N syntax and internally generates the optimal logical plan.

Filtering Strategies

Pre‑filter : apply filters first, then vector search – highest precision but costly I/O.

Post‑filter : vector search first, then filters – better performance, slightly lower precision.

Iterative post‑filter : simulate pre‑filter by multiple rounds of post‑filtering, balancing recall and latency.

The iterative post‑filter is the default in the released code.

Index Write Path

Data is compressed, coarsely ranked, then combined with text search (MongoDB/Elasticsearch) for final re‑ranking. For small segments where clustering fails, an empty index is written and a brute‑force fallback computes distances during query time.

Index Rebuild and Maintenance

IVFPQ relies on clustering; as data grows, cluster quality degrades. StarRocks triggers index rebuild during each compaction to keep cluster centroids fresh. Segment‑level rebuilds are cheaper than tablet‑level incremental updates, which are not widely supported.

Block Cache Optimization

A block‑level memory cache reduces IVFPQ file reads by ~50% while preserving latency and recall, addressing the read‑amplification problem in large‑scale segments.

Performance Evaluation

On a single node with 300 k–1 M vectors (50‑dimensional), query latency is in the low‑double‑digit milliseconds. In QPS tests, the system reaches ~2 000 queries per second, comparable to specialized vector databases. Recall matches or exceeds that of Milvus and Elasticsearch, and cost drops to one‑third of the original multi‑database setup.

Real‑World Deployments

StarRocks’ vector search has been deployed in multiple Tencent services, reducing query latency from 15 seconds to 2 seconds for TOP 10 k queries and cutting operational costs by 66 %.

Challenges and Future Work

Key challenges include high‑concurrency latency, read‑amplification on massive data, and supporting complex RAG pipelines. Planned improvements are a dedicated serving layer inspired by Elasticsearch’s scatter‑gather model, incremental index construction, adaptive range‑search parameters, and richer hybrid‑search capabilities.

StarRocks vector search HNSW approximate nearest neighbor ANN database indexing IVFPQ

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.