Databases 8 min read

How Redis’s New Multithreaded Query Engine Supercharges Vector Search for AI

Redis has introduced a multithreaded query engine that dramatically boosts throughput and lowers latency for vector searches, enabling scalable, real‑time retrieval‑augmented generation (RAG) workloads while preserving the low‑latency performance of its core in‑memory database.

Programmer DD

Aug 16, 2024

How Redis’s New Multithreaded Query Engine Supercharges Vector Search for AI

Redis, the popular in‑memory data‑structure store, has launched an enhanced query engine at a time when vector databases are gaining prominence for retrieval‑augmented generation (RAG) in generative AI applications.

The new engine adopts multithreading, allowing concurrent access to indexes and vertical scaling, which dramatically increases query throughput while keeping latency below a few milliseconds.

Redis stresses that this improvement is crucial when datasets grow to hundreds of millions of documents, where complex queries can otherwise throttle throughput; the engine maintains sub‑millisecond response times and average query latency under 10 ms.

The company acknowledges the limitations of its traditional single‑threaded architecture, where long‑running queries cause congestion, especially when using inverted indexes.

Search operations are not O(1); they typically involve multiple index scans that run in O(log n) time, where n is the number of indexed data points. The multithreaded approach resolves these challenges and markedly raises throughput for compute‑intensive tasks such as vector similarity search.

Redis describes efficient scaling as a combination of horizontal data distribution and vertical multithreaded processing, enabling concurrent index access.

The new architecture follows a three‑step workflow: the main thread prepares the query context and queues it; worker threads pull tasks from the queue and execute query pipelines concurrently; results are then returned to the main thread, allowing it to continue handling regular Redis commands.

Benchmarking shows the upgraded engine outperforms three categories of vector‑database providers—pure vector stores, general databases with vector capabilities, and fully managed in‑memory Redis cloud services—delivering higher speed, better scalability, and superior overall performance.

While the vector‑database market is rapidly expanding and becoming saturated, experts note that strong semantic search is only one piece of the AI stack; integrating vector capabilities into existing databases can be more effective than building new standalone solutions.

Redis claims a 16× increase in query throughput over the previous generation, meeting the stringent latency requirements of real‑time RAG applications, such as chatbots that must retrieve data from vector stores within the “100 ms rule.”

Extensive benchmarks cover ingestion (using HNSW and ANN algorithms) and search workloads (k‑NN), measuring requests per second and average client latency across datasets like gist‑960‑euclidean, glove‑100‑angular, deep‑image‑96‑angular, and dbpedia‑openai‑1M‑angular, using the standard Qdrant vector‑db‑benchmark tool.

The new query engine is already available in Redis Software and is slated for release in Redis Cloud later this fall.

Performance AI RAG Redis vector database multithreading