How Redis’s New Multithreaded Query Engine Boosts Vector Search Performance
Redis has introduced a multithreaded query engine that dramatically reduces latency and increases throughput—up to 16×—for vector similarity searches, enabling vertical scaling and better support for real‑time RAG applications compared to traditional single‑threaded architectures and competing vector databases.
Multithreaded Query Engine in Redis
Redis introduced a multithreaded query execution model to support Retrieval‑Augmented Generation (RAG) workloads that rely on vector similarity search. The engine keeps average query latency below 10 ms while increasing throughput dramatically.
Why multithreading?
Traditional Redis processes commands on a single thread. Complex queries that use inverted indexes and multiple O(log n) index scans can block the main thread, limiting throughput. By off‑loading the index‑access phase to a pool of worker threads, Redis can serve many queries in parallel without sacrificing the responsiveness of core key‑value operations.
Execution Pipeline
Main thread builds the query plan (context) and enqueues a task in a shared work queue.
Worker threads dequeue tasks, run the full query pipeline (e.g., vector similarity, filter predicates, HNSW ANN search) concurrently, and produce partial result sets.
Result aggregation – the main thread collects the partial results, merges them, and returns the final response to the client.
This design enables vertical scaling: a single Redis instance can handle billions of document‑level vectors while maintaining low latency.
Benchmark Methodology
Redis evaluated the new engine against three categories of competitors:
Pure vector databases (e.g., Qdrant, Milvus).
General‑purpose databases with vector extensions.
Fully managed in‑memory Redis cloud services.
Benchmarks used four public datasets:
gist‑960‑euclidean
glove‑100‑angular
deep‑image‑96‑angular
dbpedia‑openai‑1M‑angular
Workloads were generated with the vector-db-benchmark suite from Qdrant and covered both data ingestion (HNSW index construction) and k‑NN search (ANN). All tests measured average latency, 99th‑percentile latency, and queries‑per‑second (QPS).
Results
Redis achieved up to 16× higher query throughput compared with the best pure vector database while keeping average latency under 10 ms. It also outperformed the general‑purpose and managed Redis competitors in both speed and scalability. The improvement is especially relevant for latency‑sensitive RAG pipelines that aim to meet the “100 ms rule” for end‑to‑end response time.
Key Takeaways
Multithreaded index access provides vertical scaling without changing the single‑threaded command processing model.
Horizontal scaling (sharding) can be combined with the new engine for even larger data volumes.
The engine supports HNSW‑based ANN search, inverted‑index filtering, and arbitrary O(log n) scans.
Benchmarks confirm suitability for real‑time RAG applications where sub‑10 ms vector queries are required.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
