Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems

This article explores the evolution of recommendation systems for interactive live‑streaming scenarios, covering common system traits, user cold‑start solutions, prior knowledge modeling, scene‑specific modeling, and practical Q&A insights drawn from Tencent Music’s real‑world deployments.

21CTO
21CTO
21CTO
Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems

Guest: Wu Zhe, Senior Researcher, Tencent Music Editor: Wu Qiyao, UC San Diego Platform: DataFunTalk

Recommended technologies have formed a mature iterative paradigm; by deconstructing and recombining classic algorithms, effective scenario models can be produced. However, deeper iterations on common paths yield diminishing returns, requiring deeper consideration of scene characteristics. This talk uses the live‑karaoke recommendation system as a case study.

1. Commonality and Characteristics of Recommendation Systems

We first identify the shared iterative path of recommendation systems. The goal is broad, encompassing multiple objectives within a scene (e.g., interaction, click) and across scenes (e.g., total duration, revenue), as well as ecosystem goals such as content freshness, producer incentives, and traffic alignment. A recommendation system must consider consumer, producer, and platform perspectives to build long‑term value.

Next, we feed both in‑scene (feature‑based) and out‑of‑scene information (via embeddings, model distillation, or auxiliary learning). Cross‑domain information often suffers from bias; mitigation can involve joint learning, fine‑tuning, or multimodal fusion.

Performance considerations: a cascade architecture balances latency and accuracy. For high‑throughput stages (e.g., coarse ranking), multi‑tower structures, model compression, or feature selection reduce computation. Downstream stages (e.g., re‑ranking) tolerate higher latency for precision.

Model details: deconstruct large models into reusable units (gate, attention, FM, DCN) and recombine them with varied feature usage, exploring cross‑fusion, connection, and update mechanisms to craft scene‑adapted models.

Link‑level issues: full‑chain consistency modeling and bias correction prevent misaligned optimization across modules and mitigate accumulated data drift.

2. Better User Cold‑Start

Interactive scenes exhibit high entry barriers and strong perception because recommendations involve real people (hosts) rather than static media. This leads to severe user activity polarization, with most users remaining low‑activity.

Cold‑start strategies (transfer learning, per‑user models, independent modeling) often rely on strong assumptions, are hard to control, or cannot systematically surface low‑activity users.

We propose clustering users using unbiased features (e.g., gender, age). Unbiased clustering groups users of varying activity levels, creating information gaps that enable high‑activity users to transfer knowledge to low‑activity users.

Implementation: compute distances from a user to each cluster, fuse corresponding cluster embeddings into a user‑group representation, and feed this as a new feature to the model. The cluster information, derived from active users, enriches sparse low‑activity user profiles.

Training uses an NTM‑style update with attention‑weighted erase/add operations, keeping the clustering module separate from the main model to avoid interference.

We penalize the norm of shift embeddings to enforce compact clusters and minimize inter‑cluster covariance to improve separation.

3. Better Prior Modeling

In interactive scenes, we design re‑ranking rules based on prior knowledge (e.g., prioritize same‑age users) to enhance perceived relevance.

Analysis of click‑through vs. re‑ranking curves shows re‑ranking improves the correlation between model scores and user dwell time, indicating that certain features are under‑estimated by the base model.

To amplify important features, we inject feature‑level weights (0–2) into the input layer, akin to a feature‑level Mixture‑of‑Experts or target‑attention mechanism, allowing selective strengthening or weakening of features.

We also add auxiliary tasks that predict feature values from inputs, forcing the model to be sensitive to those features; the auxiliary loss weight can be tuned.

For output‑side priors, we treat multi‑task relationships as probabilistic transitions (e.g., “>10 s” predicts “>30 s”), using Bayesian modeling to capture conditional dependencies without over‑constraining the cascade.

4. Better Scene Modeling

Scenes differ in UI and user intent (click‑based, immersive browsing, breathing‑type). We route samples to shared and private network components: a common module learns cross‑scene patterns, while private modules capture scene‑specific nuances, reducing interference.

Multi‑scene modeling is distinct from multi‑objective modeling; the former addresses heterogeneous system contexts, the latter handles multiple targets within a single context.

5. Q&A

Q: Does a model batch contain only one sample type? A: No, batches mix sample types, which are then routed to appropriate model branches.

Q: Can you detail the MemoryNet structure? A: It clusters users, derives cluster embeddings, computes user‑cluster relevance, fuses embeddings into a user‑group representation, and feeds it back to the model.

Q: Is the number of clusters a hyper‑parameter? A: Yes, it is set beforehand.

Q: Is clustering end‑to‑end learnable? A: Yes, we use adaptive end‑to‑end learning for clustering.

Q: How do live‑stream and short‑video recommendation differ? A: Beyond technical differences, human factors and commonsense play a larger role in live‑stream scenarios.

Q: Are embeddings shared across tasks? A: Often, but not always; sharing depends on the similarity of user behavior across tasks, and loss balancing can use methods like GradNorm.

Thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Model Optimizationlive streamingAIRecommendation Systemsuser cold startfeature clustering
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.