How gRPC Transformed Our Faiss Vector Search Service for Faster Recommendations

This article details how the MX recommendation team rebuilt their Faiss‑based similarity search service using gRPC, covering service selection, multi‑type vector handling, dynamic index updates, deployment strategies, and performance gains that doubled QPS and cut latency by two‑thirds.

MXPlayer Technical Team
MXPlayer Technical Team
MXPlayer Technical Team
How gRPC Transformed Our Faiss Vector Search Service for Faster Recommendations

Service Selection

Initially the MX Faiss server was a Flask‑based web service derived from the open‑source faiss-web-service, which suffered from high latency (~6 ms) and poor scalability. After evaluating RPC frameworks, gRPC was chosen over Thrift for its superior throughput and lower latency in 10 ms‑level workloads.

Requirement Analysis

1. Multi‑type Vector Loading

Vectors generated by various algorithms are stored in AWS S3 with keys encoding algorithm, item type, and version. The Faiss server must load these vectors, build corresponding indexes, and serve similarity searches based on the requested algorithm and item type.

2. Multi‑type Index Experimentation

Faiss offers many index types; the team experiments with different index‑vector combinations to find the best trade‑off among recommendation quality, memory usage, CPU load, and response time.

3. Configurable Indexes

Because index performance varies, the server must allow easy addition, removal, or modification of indexes with minimal code changes.

4. Index Version Control

Both item and user embeddings are versioned. The server must support loading multiple versions of the same vector type so that the recommendation system can select the appropriate version for nearest‑neighbor queries.

5. Hot Index Updates

New vectors are generated daily; the server must refresh indexes with the latest vectors without downtime while still handling incoming requests.

Design and Implementation

1. Faiss Server

1.1 Organization

The server groups indexes in FaissHandler, identified uniquely by algorithm_type, category, and index_type. An index_dict maps version keys to actual Faiss indexes, and latest_version tracks the newest index.

1.2 Interface Design

The RPC interface SearchRequest requires four mandatory fields (algorithmType, category, num, indexType) and at least one of itemId or vector. Responses return a simple key‑value map of nearest‑neighbor items and scores.

message SearchRequest {
  string algorithmType = 1;
  string category = 2;
  int32 num = 3;
  string indexType = 4;
  repeated string itemId = 5;
  repeated FloatArray vector = 6;
}

message FloatArray {
  repeated float val = 1;
}

message SearchResponse {
  map<string, float> similarItems = 1;
}

1.3 Search Flow

2. Index Update

2.1 Architecture

A Celery beat creates an update task every two minutes. Workers download vector files from S3, build indexes, serialize them, and notify the Faiss server via gRPC to reload the new indexes. Concurrency is managed with per‑vector locks to avoid duplicate downloads.

2.2 Update Process

The diagram shows parallel task execution: each handler spawns a version‑check task, which may launch a download‑vector task. If a lock cannot be acquired, the worker skips the download, preventing resource waste.

Deployment

The Faiss server is written in Python; due to the GIL, multiple service instances are run on a single machine to utilize all CPU cores.

Online Performance

Stress testing showed that the gRPC‑based Faiss server achieved more than double the QPS of the previous implementation and reduced response time by roughly 67%.

Conclusion

Switching to a gRPC‑driven Faiss server resolved earlier scalability and latency issues, delivering a high‑efficiency, highly extensible similarity search service for recommendation workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonrecommendation systemgRPCvector searchFAISS
MXPlayer Technical Team
Written by

MXPlayer Technical Team

Technical articles and experience sharing. MXPLAYER is the top-ranked online video content platform in India, and also the world's largest player app, with 100M+ DAU and hundreds of millions of MAU.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.