Artificial Intelligence 23 min read

Engineering Practices of the K‑Song Recommendation System at Tencent Music

This article presents a comprehensive technical overview of the K‑Song recommendation platform, covering its backend architecture, the evolution of recall strategies, feature management and ranking pipelines, large‑scale deduplication techniques, and the debugging and monitoring infrastructure that support high‑performance personalized music recommendations.

DataFunTalk
DataFunTalk
DataFunTalk
Engineering Practices of the K‑Song Recommendation System at Tencent Music

01 K‑Song Recommendation Backend Architecture

The system consists of an offline layer (data processing platform and the VENUS algorithm platform) and an online layer (recall, ranking, and re‑ranking services) built on top of a shared storage tier, with additional middle‑platform components such as AB‑test, content distribution, and quality monitoring.

02 Recall

The recall component evolved through three versions: V1 used a Redis KV inverted index, V2 introduced dual MongoDB stores with a local KV cache, and V3 added a dual‑buffer full‑cache design with minute‑level periodic updates, achieving higher cache hit rates, lower CPU load, and QPS up to 1.6 × 10⁴ on an 8‑core machine.

03 Ranking

The ranking pipeline focuses on three aspects: a feature platform for unified feature registration, storage and retrieval; selection of an efficient feature format that replaces TFRecord with a lightweight binary layout, reducing CPU and memory usage by up to ten‑fold; and a feature aggregation & model prediction framework that employs multi‑level caching and separates user and item features to cut network I/O by one‑third.

04 Deduplication

Two deduplication schemes are compared: a plain list (simple but memory‑heavy) and a Bloom filter (compact with false‑positive risk). The team built a custom multi‑shard, auto‑evicting Bloom filter supporting both Cmongo and CKV+ storage, achieving >5× storage savings, ~10× lookup speed, and ~7× latency reduction compared with the list approach.

05 Debug & Monitoring

The debugging ecosystem includes a profiling platform for feature inspection, an in‑app modular debug tool for real‑time recommendation trace, a comprehensive monitoring suite (real‑time metrics, AB‑test significance, and drill‑down analysis), and a log‑replay system that visualizes the end‑to‑end recommendation path across user and item dimensions, integrated with the feature and portrait platforms.

Overall, the engineering practices described demonstrate how large‑scale, low‑latency recommendation services are built, optimized, and operated in a production environment serving hundreds of millions of monthly active users.

debuggingrecommendation systemRankingrecallDeduplicationK‑SongTencent Music
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.