How gRPC Transformed Our Faiss Vector Search Service for Faster Recommendations
This article details how the MX recommendation team rebuilt their Faiss‑based similarity search service using gRPC, covering service selection, multi‑type vector handling, dynamic index updates, deployment strategies, and performance gains that doubled QPS and cut latency by two‑thirds.
Service Selection
Initially the MX Faiss server was a Flask‑based web service derived from the open‑source faiss-web-service, which suffered from high latency (~6 ms) and poor scalability. After evaluating RPC frameworks, gRPC was chosen over Thrift for its superior throughput and lower latency in 10 ms‑level workloads.
Requirement Analysis
1. Multi‑type Vector Loading
Vectors generated by various algorithms are stored in AWS S3 with keys encoding algorithm, item type, and version. The Faiss server must load these vectors, build corresponding indexes, and serve similarity searches based on the requested algorithm and item type.
2. Multi‑type Index Experimentation
Faiss offers many index types; the team experiments with different index‑vector combinations to find the best trade‑off among recommendation quality, memory usage, CPU load, and response time.
3. Configurable Indexes
Because index performance varies, the server must allow easy addition, removal, or modification of indexes with minimal code changes.
4. Index Version Control
Both item and user embeddings are versioned. The server must support loading multiple versions of the same vector type so that the recommendation system can select the appropriate version for nearest‑neighbor queries.
5. Hot Index Updates
New vectors are generated daily; the server must refresh indexes with the latest vectors without downtime while still handling incoming requests.
Design and Implementation
1. Faiss Server
1.1 Organization
The server groups indexes in FaissHandler, identified uniquely by algorithm_type, category, and index_type. An index_dict maps version keys to actual Faiss indexes, and latest_version tracks the newest index.
1.2 Interface Design
The RPC interface SearchRequest requires four mandatory fields (algorithmType, category, num, indexType) and at least one of itemId or vector. Responses return a simple key‑value map of nearest‑neighbor items and scores.
message SearchRequest {
string algorithmType = 1;
string category = 2;
int32 num = 3;
string indexType = 4;
repeated string itemId = 5;
repeated FloatArray vector = 6;
}
message FloatArray {
repeated float val = 1;
}
message SearchResponse {
map<string, float> similarItems = 1;
}1.3 Search Flow
2. Index Update
2.1 Architecture
A Celery beat creates an update task every two minutes. Workers download vector files from S3, build indexes, serialize them, and notify the Faiss server via gRPC to reload the new indexes. Concurrency is managed with per‑vector locks to avoid duplicate downloads.
2.2 Update Process
The diagram shows parallel task execution: each handler spawns a version‑check task, which may launch a download‑vector task. If a lock cannot be acquired, the worker skips the download, preventing resource waste.
Deployment
The Faiss server is written in Python; due to the GIL, multiple service instances are run on a single machine to utilize all CPU cores.
Online Performance
Stress testing showed that the gRPC‑based Faiss server achieved more than double the QPS of the previous implementation and reduced response time by roughly 67%.
Conclusion
Switching to a gRPC‑driven Faiss server resolved earlier scalability and latency issues, delivering a high‑efficiency, highly extensible similarity search service for recommendation workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MXPlayer Technical Team
Technical articles and experience sharing. MXPLAYER is the top-ranked online video content platform in India, and also the world's largest player app, with 100M+ DAU and hundreds of millions of MAU.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
