Databases 6 min read

Redis’s Multithreaded Query Engine Boosts RAG Performance

Redis introduces a multithreaded query engine that keeps average latency under 10 ms while delivering up to 16× higher throughput for vector‑search workloads, enabling faster retrieval‑augmented generation (RAG) applications and outperforming pure vector databases and managed Redis services in benchmark tests.

SpringMeng

Feb 7, 2026

Redis’s Multithreaded Query Engine Boosts RAG Performance

Multithreaded query execution

Redis adds a multithreaded query engine that keeps average query latency below 10 ms while raising throughput dramatically. By allowing concurrent access to indexes, the engine achieves vertical scaling for workloads with hundreds of millions of documents.

Problem with the single‑threaded architecture

Traditional Redis processes long‑running queries—especially those that use inverted indexes—in a single thread. Such searches require multiple O(log n) index scans (where n is the number of indexed points) and can cause congestion, reducing overall throughput.

Three‑step execution model

Main thread prepares the query context (planning) and enqueues it in a shared queue.

Multiple worker threads dequeue tasks and execute the query pipeline concurrently, enabling many queries to be processed in parallel.

After execution, results are returned to the main thread, which aggregates them and sends the final response to the client.

Scaling strategy

Redis emphasizes that efficient scaling combines horizontal scaling (distributed data load) with vertical scaling (multithreaded index access) to handle growing data volumes and query demand.

Benchmark methodology

Redis benchmarked the new engine against three categories of competitors:

Pure vector databases

General‑purpose databases with vector capabilities

Fully managed in‑memory Redis cloud service providers (CSPs)

Datasets used: gist-960-euclidean, glove-100-angular, deep-image-96-angular, dbpedia-openai-1M-angular. The industry‑standard vector‑db‑benchmark from Qdrant measured data ingestion with HNSW indexing, ANN search, and k‑NN workloads.

Results

Redis outperformed pure vector databases in speed and scalability and significantly exceeded the overall performance of both general‑purpose databases and managed Redis CSPs. The new engine delivered a 16× increase in query throughput while maintaining sub‑10 ms latency, making it suitable for real‑time RAG scenarios that target the “100 ms rule” for user‑perceived latency.

Industry perspective

Reddit chief engineer Doug Turnbull warned that an oversupply of vector‑database options can overwhelm users. Vectera’s Ofer Mendelevitch and RisingWave Labs founder Wu Yingjun argued that vector databases are only one component of the AI stack and favor enhancing existing databases with vector engines. Redis’s approach aligns with this view by extending its existing infrastructure.