How SimSvr Achieves Billion‑Scale Real‑Time ANN Search for Recommendations
SimSvr is a high‑performance, distributed feature‑retrieval component designed for recommendation systems that supports billion‑scale indexes, sub‑millisecond query latency, real‑time and batch updates, multi‑model AB‑testing, and advanced filtering, all while running on Tencent's production workloads.
Background
Recommendation, image retrieval, and deduplication systems often need k‑nearest‑neighbor (k‑NN) search over feature vectors at massive scale, requiring billion‑level indexes, ultra‑low latency, real‑time updates, multi‑model AB‑testing, and flexible filtering.
Problems with Existing Solutions
Academic ANN libraries are single‑node only and cannot serve as high‑performance, reliable distributed components.
Industry wrappers lack scalability and high‑availability for online services.
Many components support only offline or only online updates, failing to meet WeChat’s requirement of updating thousands of indexes per second and billions per hour.
SimSvr Overview
Distributed, scalable architecture that can handle >10⁹ indexes with query latency under 10 ms.
Uses hnswlib as the primary recall engine, achieving most queries within 2 ms.
Cluster management with built‑in data scheduling and dynamic routing.
Supports both task‑based and automatic updates, covering updates from a few thousand keys per second to billions per hour.
Read‑write separation isolates heavy offline indexing from online query serving.
Rich feature set: lightweight embedding KV store, multi‑table multi‑index, versioned indexes, filters, and expiration deletion.
Engine Selection
Two engines were chosen based on performance and storage capacity:
hnswlib : best performance in ann‑benchmarks, delivering >90 % recall within 1 ms.
FAISS (IVF‑HNSW + PQ) : compresses vectors 10‑30×, enabling billion‑scale indexes to fit into a 64 GB machine.
Resource Optimization
By integrating hnswlib into SimSvr and leveraging read‑write separation, a single machine can host many more model indexes, achieving up to a 50 % increase in data capacity under typical worker/thread configurations.
Distance Conversion for Inner‑Product Search
HNSW performs poorly on inner‑product (dot‑product) distance because it is a non‑metric space. SimSvr adopts the “ip2cos” technique from the paper *Non‑metric Similarity Graphs for Maximum Inner Product Search*, converting inner‑product to cosine distance, which raises recall from 62.6 % to 97.8 % in real tests.
FAISS Batch k‑means Acceleration
FAISS was extended with a batch k‑means algorithm that dramatically speeds up training on 10 M‑scale datasets (128‑dim, IP distance) while preserving clustering quality, reducing training time by more than two hours and improving recall by ~30 %.
Overall Design
Data Structure
SimSvr treats each table as a collection of sharding‑sections (containers). A table is split into shard0, shard1, …, shardN; each section is replicated (sect count) for read scaling and fault tolerance.
System Architecture
Three external dependencies:
Chubby – stores metadata, routing, and worker information.
USER_FS – distributed file system (WFS/HDFS) holding raw data.
SimSvr_FS – stores generated index files and incremental data.
Key components:
Worker : polls Chubby, loads the latest index, and serves queries.
Master : schedules data, generates routing tables, and triggers index builds via distributed locks.
Trainer : builds or rebuilds sharding indexes; can run concurrently to accelerate index construction.
Data Update Mechanisms
Automatic Update : a directory with monotonically increasing numeric sub‑folders signals new data; the master detects the change, creates a task, and the trainer rebuilds the index.
Task‑Based Update : business services submit an index task via API, specifying FS paths; the trainer executes the task and workers reload the new index.
Data Scheduling
Master assigns containers to workers based on health and resource usage, generating versioned routing tables. Clients cache the routing table and send parallel requests to the appropriate workers, merging results before returning to the business layer.
System Expansion
Tables can be split into finer‑grained containers, allowing horizontal scaling of storage capacity and independent read‑replica scaling for hot tables.
Near‑Real‑Time Incremental Updates
To meet sub‑second update latency, SimSvr writes new data to the file system first, then workers pull and load it. Small batches are inserted directly by workers; large batches are merged and rebuilt by trainers before workers load the final index, keeping online read latency unaffected.
Rich Feature Set
Additional feature KV store for model‑specific metadata.
Atomic multi‑index updates on a single table for AB‑testing.
Multi‑version index support, allowing simultaneous serving of different model versions.
Integrated Bloom and threshold filters to prune irrelevant keys during recall.
Expiration‑based deletion to remove stale items from results in real time.
Production Deployment
SimSvr runs over 160 model indexes, consumes >8 000 logical cores, and stores >2 billion feature vectors, powering WeChat Video, “Look”, and “Search” recommendation services. In “Search”, SimSvr’s vector index (1.7 × 10⁸ × 128‑dim) yields <8 ms average latency for 1.25 billion daily queries. The system also improves article search recall by 7 % and enables fast video deduplication with sub‑8 ms latency.
Conclusion
SimSvr demonstrates that a well‑designed distributed ANN service can satisfy the demanding scale, latency, and feature requirements of modern recommendation systems, providing a solid foundation for future AI‑driven product features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
