How SimSvr Achieves Billion‑Scale Real‑Time ANN Search for Recommendations

SimSvr is a high‑performance, distributed feature‑retrieval component designed for recommendation systems that supports billion‑scale indexes, sub‑millisecond query latency, real‑time and batch updates, multi‑model AB‑testing, and advanced filtering, all while running on Tencent's production workloads.

ITPUB
ITPUB
ITPUB
How SimSvr Achieves Billion‑Scale Real‑Time ANN Search for Recommendations

Background

Recommendation, image retrieval, and deduplication systems often need k‑nearest‑neighbor (k‑NN) search over feature vectors at massive scale, requiring billion‑level indexes, ultra‑low latency, real‑time updates, multi‑model AB‑testing, and flexible filtering.

Problems with Existing Solutions

Academic ANN libraries are single‑node only and cannot serve as high‑performance, reliable distributed components.

Industry wrappers lack scalability and high‑availability for online services.

Many components support only offline or only online updates, failing to meet WeChat’s requirement of updating thousands of indexes per second and billions per hour.

SimSvr Overview

Distributed, scalable architecture that can handle >10⁹ indexes with query latency under 10 ms.

Uses hnswlib as the primary recall engine, achieving most queries within 2 ms.

Cluster management with built‑in data scheduling and dynamic routing.

Supports both task‑based and automatic updates, covering updates from a few thousand keys per second to billions per hour.

Read‑write separation isolates heavy offline indexing from online query serving.

Rich feature set: lightweight embedding KV store, multi‑table multi‑index, versioned indexes, filters, and expiration deletion.

Engine Selection

Two engines were chosen based on performance and storage capacity:

hnswlib : best performance in ann‑benchmarks, delivering >90 % recall within 1 ms.

FAISS (IVF‑HNSW + PQ) : compresses vectors 10‑30×, enabling billion‑scale indexes to fit into a 64 GB machine.

Resource Optimization

By integrating hnswlib into SimSvr and leveraging read‑write separation, a single machine can host many more model indexes, achieving up to a 50 % increase in data capacity under typical worker/thread configurations.

Distance Conversion for Inner‑Product Search

HNSW performs poorly on inner‑product (dot‑product) distance because it is a non‑metric space. SimSvr adopts the “ip2cos” technique from the paper *Non‑metric Similarity Graphs for Maximum Inner Product Search*, converting inner‑product to cosine distance, which raises recall from 62.6 % to 97.8 % in real tests.

FAISS Batch k‑means Acceleration

FAISS was extended with a batch k‑means algorithm that dramatically speeds up training on 10 M‑scale datasets (128‑dim, IP distance) while preserving clustering quality, reducing training time by more than two hours and improving recall by ~30 %.

Overall Design

Data Structure

SimSvr treats each table as a collection of sharding‑sections (containers). A table is split into shard0, shard1, …, shardN; each section is replicated (sect count) for read scaling and fault tolerance.

System Architecture

Three external dependencies:

Chubby – stores metadata, routing, and worker information.

USER_FS – distributed file system (WFS/HDFS) holding raw data.

SimSvr_FS – stores generated index files and incremental data.

Key components:

Worker : polls Chubby, loads the latest index, and serves queries.

Master : schedules data, generates routing tables, and triggers index builds via distributed locks.

Trainer : builds or rebuilds sharding indexes; can run concurrently to accelerate index construction.

Data Update Mechanisms

Automatic Update : a directory with monotonically increasing numeric sub‑folders signals new data; the master detects the change, creates a task, and the trainer rebuilds the index.

Task‑Based Update : business services submit an index task via API, specifying FS paths; the trainer executes the task and workers reload the new index.

Data Scheduling

Master assigns containers to workers based on health and resource usage, generating versioned routing tables. Clients cache the routing table and send parallel requests to the appropriate workers, merging results before returning to the business layer.

System Expansion

Tables can be split into finer‑grained containers, allowing horizontal scaling of storage capacity and independent read‑replica scaling for hot tables.

Near‑Real‑Time Incremental Updates

To meet sub‑second update latency, SimSvr writes new data to the file system first, then workers pull and load it. Small batches are inserted directly by workers; large batches are merged and rebuilt by trainers before workers load the final index, keeping online read latency unaffected.

Rich Feature Set

Additional feature KV store for model‑specific metadata.

Atomic multi‑index updates on a single table for AB‑testing.

Multi‑version index support, allowing simultaneous serving of different model versions.

Integrated Bloom and threshold filters to prune irrelevant keys during recall.

Expiration‑based deletion to remove stale items from results in real time.

Production Deployment

SimSvr runs over 160 model indexes, consumes >8 000 logical cores, and stores >2 billion feature vectors, powering WeChat Video, “Look”, and “Search” recommendation services. In “Search”, SimSvr’s vector index (1.7 × 10⁸ × 128‑dim) yields <8 ms average latency for 1.25 billion daily queries. The system also improves article search recall by 7 % and enables fast video deduplication with sub‑8 ms latency.

Conclusion

SimSvr demonstrates that a well‑designed distributed ANN service can satisfy the demanding scale, latency, and feature requirements of modern recommendation systems, providing a solid foundation for future AI‑driven product features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Recommendation Systemsreal-time indexingANNfeature retrieval
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.