Fine‑Grained Content Understanding and Operation in QQ Music: Optimizing the Recommendation System
This article presents QQ Music’s end‑to‑end solution for data‑driven content understanding, value evaluation, and fine‑grained operation, detailing offline and real‑time pipelines, neural‑network models, a content middle‑platform, parameter services, and a precise delivery system that boost user engagement while preserving experience.
Bill from QQ Music’s data science team introduces a comprehensive approach to improve content understanding and fine‑grained operation for the platform’s recommendation system.
Background and challenges : Traditional manual configuration of music‑hall tabs is labor‑intensive, fragmented, and lacks data feedback, while the recommendation system focuses on short‑term metrics and struggles with cold‑start and long‑tail content.
Solution overview : A three‑step strategy—(1) content understanding via a scientific value‑assessment framework, (2) content support using the evaluation results to balance traffic allocation, and (3) intelligent delivery that closes the loop with real‑time feedback.
Content understanding : The pipeline separates head and tail content, applying user‑feedback‑driven metrics for head items and sparse‑feedback or pure‑content features for tail and cold‑start items. Interaction quality is computed from impression‑click ratios with Bayesian smoothing and exponential decay, then linearly weighted. Real‑time scoring aggregates multiple data streams via union, joins with content/user attributes using async I/O and caching, and writes results to both Elasticsearch and TDW.
Predictive modeling : A Temporal Convolutional Network (TCN) with causal, dilated, and residual modules forecasts future hotness; a Predictive Model (PDM) decomposes songs into multi‑dimensional embeddings (audio, lyrics, melody, rhythm) and combines them with user‑based and side‑info embeddings (graph‑based node2vec/EGES) in a deep neural network (MetaPDM) for robust potential content mining.
Content middle‑platform : Provides offline/online evaluation results, stores them in a two‑level cache (Elasticsearch + CKV), and exposes flexible content‑pool and parameter services with AB testing capabilities. The platform supports vertical pools (genre, language, artist) and dynamic strategy configuration.
Precise delivery system : Differentiates recommendation (user‑centric) from delivery (content‑centric). It includes a management console, core backend (ranking, re‑ranking, user‑experience modules), and user‑facing slots. Ranking uses XGBoost, AE + DeepFM (audio embedding + feature crossing). Re‑ranking adjusts scores based on task progress, genre bias, and real‑time performance. User experience is enhanced by profiling exploratory vs. conservative users and applying task‑exit thresholds.
Results : Offline and online experiments show a >10% increase in qualified songs entering the admission pool, 10‑20% uplift in average listening time and completion rate, >47% boost in playback share for prioritized artists, and higher completion rates for delivered content.
Future work : Plans include extending content evaluation algorithms, online learning with Flink, multi‑objective distribution, richer audio‑behavior models, real‑time look‑alike, and broader traffic‑controlled scenarios.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.