Tencent Kankan Information Feed: Architecture, Challenges, and Optimizations
Peng Mo’s talk details Tencent Kankan’s billion‑user feed architecture—layered data, recall, ranking, and exposure control—while addressing real‑time feature generation, massive concurrency, memory‑intensive caching, and fast indexing, and explains solutions such as multi‑level caches, online minute‑level model updates, Redis bloom‑filter exposure filtering, a lock‑free hash‑plus‑linked‑list index, and distributed optimizations that halve latency to under 500 ms and support auto‑scaling and cold‑start handling.
In this article we summarize a talk by Peng Mo, Director of Tencent Kankan Independent End Recommendation R&D Center, on the architecture and engineering challenges of the Kankan information feed, which serves billions of users with real‑time recommendation services.
The system handles massive traffic: the QQ browser home page displays three feed formats (short video, micro‑video, and article), with daily click and exposure logs reaching hundreds of billions and a user base exceeding 100 million. The architecture is divided into several layers: a low‑level data layer (inverted index, feature store, user model), a recall layer (implicit and explicit recall, UCF, ICF, RNN), a coarse‑ranking layer, a fine‑ranking layer, and a final exposure control layer that applies diversity and manual interventions.
Key challenges include:
Real‑time feature generation: features such as CTR must be refreshed within seconds to reflect user feedback.
High concurrency and low latency: the system processes tens of thousands of queries per second, requiring careful parallelism and avoiding excessive fan‑out.
Memory pressure for feature storage: billions of features per second demand multi‑level caching and sharding.
Scalable indexing: the index must support rapid insertion, long chain handling, and exposure‑history filtering.
Solutions presented:
Feature system uses multi‑level caches and pushes pre‑computed features via APIs/libraries, reducing network calls and memory usage through sharding.
Online learning pipeline (named “WuLiang”) stitches samples in real time, with incremental model updates every minute and hourly full refreshes, and a custom deployment system (RongDa) for model distribution.
Bloom filters stored in Redis are employed for user exposure history filtering, with dynamic sizing and consistent hashing to keep memory low and ensure cross‑region consistency.
Index service is built on a hash‑plus‑linked‑list structure, deployed on multiple machines without lock contention, and uses DCache for storage.
Performance optimizations reduced end‑to‑end latency by ~50%, keeping recommendation response time under 500 ms through distributed parallelism, caching of scoring results, and code‑level improvements (e.g., GCC upgrades, log reduction).
The talk also covered operational aspects such as automatic scaling of RPC frameworks, multi‑region deployment, and handling of new‑user cold‑start via content freshness and auxiliary data.
A Q&A session addressed practical questions about bloom filter implementation, index storage choices, exposure correction, and the balance between recall breadth and development efficiency.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.