Artificial Intelligence 19 min read

Live‑Streaming Recommendation System: Interaction Scenarios, User Cold‑Start, Prior Modeling, and Scene Modeling

The article presents a comprehensive technical overview of a live‑streaming recommendation system, covering common and specific characteristics, user cold‑start strategies using unbiased clustering, prior knowledge integration, multi‑task modeling, and scene‑aware routing to improve relevance and engagement in interactive environments.

DataFunTalk
DataFunTalk
DataFunTalk
Live‑Streaming Recommendation System: Interaction Scenarios, User Cold‑Start, Prior Modeling, and Scene Modeling

Introduction – The talk uses the QQ Music and 全民K歌 live‑streaming recommendation system as a case study to illustrate how mature iterative paradigms can be re‑structured for interactive scenarios.

01. Commonality and Characteristics of Recommendation Systems

Recommendation systems aim to optimize multiple goals within a scene (e.g., interaction, click) and across scenes (e.g., total watch time, revenue). They must consider consumer, producer, and platform perspectives, balance performance and accuracy through cascade architectures, and incorporate model‑level details such as gates, attention, FM, and DCN.

Define broad objectives (behavioral, cross‑scene, ecological).

Input both in‑scene features (explicit) and out‑of‑scene information (embedding, model distillation, auxiliary learning) with bias‑mitigation steps.

Performance considerations: multi‑tower structures for coarse ranking, model compression, feature selection for efficiency; higher tolerance for precision‑focused re‑ranking.

Model detail construction: decompose large models into reusable units (gate, attention, FM, DCN) and recombine them for scene‑specific adaptation.

Full‑link consistency modeling and bias correction to align upstream and downstream objectives.

02. Improving User Cold‑Start

Interactive live‑streaming exhibits high user activity disparity, causing most users to remain low‑activity. Traditional cold‑start methods (transfer learning, per‑user models) have strong assumptions or limited control.

Solution: cluster users using unbiased features (e.g., gender, age) to create clusters that contain both high‑ and low‑activity users, enabling information transfer from active to inactive users.

Implementation steps:

Cluster module input: unbiased features.

Compute distances from a user to each cluster, fuse corresponding cluster embeddings into a "group representation".

Return this representation as a new feature to the main model.

Use NTM‑style update rules (attention, erase/add) to update cluster embeddings without interfering with the main model.

Regularize shift embeddings (penalize norm) to keep intra‑cluster compactness and add covariance‑based loss to increase inter‑cluster separation.

Discussion on bias of features: unbiased features provide balanced information transfer; biased features can also be used but should be limited to high‑coverage, high‑importance ones.

03. Prior Modeling

In interactive scenes, strong perception (real‑time human hosts) demands prior knowledge integration, such as re‑ranking rules that prioritize same‑age users.

Observations:

Re‑ranking curves show higher slope than precision ranking, indicating that re‑ranking captures useful signals missed by the base model.

To amplify important features, introduce feature‑level weighting (similar to MoE or target‑attention) where prior features generate weights in (0,2) applied to original features.

Use auxiliary tasks (predicting feature values) to enforce sensitivity to prior features.

Model multi‑task dependencies as probabilistic transitions (e.g., ">10s" predicts ">30s"), adding consistency penalties to loss.

04. Scene Modeling

Different UI types (click‑based, immersive, breathing) present heterogeneous information; thus, a routing‑style network is employed:

Shared common module learns universal user/content patterns.

Private modules specialize for each scene, reducing interference.

Samples are split by type; each activates its corresponding private component and task tower.

Multi‑scene vs. multi‑goal distinction explained with analogies to questionnaires.

05. Q&A

Batch mixing: batches contain mixed sample types; routing directs each type to its dedicated sub‑model.

MemoryNet structure: clustering‑mediated information transfer from high‑information users to low‑information users.

Cluster count is a hyper‑parameter; clustering is learned end‑to‑end.

Embedding sharing is common but not mandatory; loss balancing can use methods like GradNorm.

Conclusion – The speaker thanks the audience and invites sharing, likes, and follows.

artificial intelligenceclusteringlive streamingRecommendation systemsmultitask learningfeature modelinguser cold-start
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.