Databases 6 min read

How Redis’s New Multithreaded Query Engine Boosts Vector Search Performance

Redis has introduced a multithreaded query engine that dramatically reduces latency and increases throughput—up to 16×—for vector similarity searches, enabling vertical scaling and better support for real‑time RAG applications compared to traditional single‑threaded architectures and competing vector databases.

macrozheng

Dec 3, 2025

How Redis’s New Multithreaded Query Engine Boosts Vector Search Performance

Multithreaded Query Engine in Redis

Redis introduced a multithreaded query execution model to support Retrieval‑Augmented Generation (RAG) workloads that rely on vector similarity search. The engine keeps average query latency below 10 ms while increasing throughput dramatically.

Why multithreading?

Traditional Redis processes commands on a single thread. Complex queries that use inverted indexes and multiple O(log n) index scans can block the main thread, limiting throughput. By off‑loading the index‑access phase to a pool of worker threads, Redis can serve many queries in parallel without sacrificing the responsiveness of core key‑value operations.

Execution Pipeline

Main thread builds the query plan (context) and enqueues a task in a shared work queue.

Worker threads dequeue tasks, run the full query pipeline (e.g., vector similarity, filter predicates, HNSW ANN search) concurrently, and produce partial result sets.

Result aggregation – the main thread collects the partial results, merges them, and returns the final response to the client.

This design enables vertical scaling: a single Redis instance can handle billions of document‑level vectors while maintaining low latency.

Benchmark Methodology

Redis evaluated the new engine against three categories of competitors:

Pure vector databases (e.g., Qdrant, Milvus).

General‑purpose databases with vector extensions.

Fully managed in‑memory Redis cloud services.

Benchmarks used four public datasets:

gist‑960‑euclidean

glove‑100‑angular

deep‑image‑96‑angular

dbpedia‑openai‑1M‑angular

Workloads were generated with the vector-db-benchmark suite from Qdrant and covered both data ingestion (HNSW index construction) and k‑NN search (ANN). All tests measured average latency, 99th‑percentile latency, and queries‑per‑second (QPS).

Results

Redis achieved up to 16× higher query throughput compared with the best pure vector database while keeping average latency under 10 ms. It also outperformed the general‑purpose and managed Redis competitors in both speed and scalability. The improvement is especially relevant for latency‑sensitive RAG pipelines that aim to meet the “100 ms rule” for end‑to‑end response time.

Key Takeaways

Multithreaded index access provides vertical scaling without changing the single‑threaded command processing model.

Horizontal scaling (sharding) can be combined with the new engine for even larger data volumes.

The engine supports HNSW‑based ANN search, inverted‑index filtering, and arbitrary O(log n) scans.

Benchmarks confirm suitability for real‑time RAG applications where sub‑10 ms vector queries are required.

RAG Redis vector database Performance Benchmark multithreading database scaling

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.