Industry Insights 26 min read

How NetEase Cloud Music Built a Real‑Time Live‑Stream Recommendation System

This article details the architecture, incremental model training, feature engineering, and deployment strategies that enabled NetEase Cloud Music to achieve real‑time live‑stream recommendation, covering business background, multi‑objective modeling, real‑time feature pipelines, sample attribution, feature admission, and online performance results.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
How NetEase Cloud Music Built a Real‑Time Live‑Stream Recommendation System

Business Background

Live‑stream recommendation is embedded in various sections of the Cloud Music app, including the live module on the song playback page, the live section mixed into comment pages, and the six‑card live area on the home page. Different placements serve different purposes: the home page introduces new users to live streams and quickly brings existing users into live rooms; the playback page recommends streams related to the currently playing song; the comment page treats live streams as a complementary social content type.

Live‑stream recommendation differs from music recommendation because user intent is ambiguous, interaction data is sparse, real‑time responsiveness is critical, and debiasing is required.

Real‑Time Recommendation Requirements

Real‑time recommendation consists of three layers: feature real‑time, model real‑time, and system real‑time. Feature real‑time continuously collects the latest user actions (e.g., play, watch, tip) and feeds them into training data. Model real‑time updates the model to capture emerging global data patterns. System real‑time ensures the latest model and features are served instantly.

Multi‑Objective Modeling

Live‑stream recommendation is a correlated multi‑objective problem: the system must predict click‑through rate, watch duration, conversion rate, and fan intimacy simultaneously. User behavior follows a sequence—click → watch → effective watch (≥30 s) → interaction → follow → gift—each step becoming a separate objective in a dependency graph.

Item Definition

In live recommendation, the item is the live host, which is a constantly changing entity (status) rather than a static song or video. The host’s current activity (e.g., PK, performance, chat) determines the recommendation.

Data Metrics and Business Environment

Host performance varies dramatically throughout the day, creating multiple peaks and valleys. Without real‑time data, a model may mis‑predict future trends. Moreover, live modules are influenced by other business modules (e.g., music style recommendations), so any change elsewhere can shift live‑stream distribution.

Real‑Time Feature Pipeline

The feature real‑time framework includes log collection, offline profiling, and real‑time profiling using Storm, Flink, and Kafka. Processed features are stored in HBase and Redis, then persisted to HDFS. A snapshot system resolves sample crossing and consistency issues.

Model Real‑Time Strategies

Model real‑time focuses on capturing global data patterns quickly. Training strategies are ranked by real‑time intensity:

Full‑batch retraining (all samples each cycle) – highest accuracy but longest latency.

Incremental updates – moderate latency, balances freshness and stability.

Online learning – updates the model with each new sample, offering the fastest adaptation but risking sparsity.

The current production stack uses an incremental approach.

Evolution of Live Ranking Models

Initially, a simple logistic‑regression model with real‑time cross‑features was used. The system later adopted an ESMM + DFM + DMR architecture: ESMM for joint training to address sample selection bias, DFM to enhance multi‑feature interactions, and DMR to model long‑ and short‑term user interests. Despite improvements in feature freshness and model complexity, model update latency remained a bottleneck.

Real‑Time Incremental Model Framework

The framework consists of:

Left side – real‑time incremental learning: consumes Kafka streams, performs ETL, and trains the model on the day’s accumulated samples.

Right side – offline learning: retains a 7‑day historical model for hot‑starting incremental training.

Sample accumulation and attribution: real‑time samples are stored in HDFS; for high‑exposure scenarios (e.g., home page), daily historical samples are aggregated to ensure label accuracy.

Model synchronization: updated model files are exported every 15 minutes to the serving engine.

Offline Model

The offline backbone is a deep‑interest ESMM‑DFM model that incorporates Wide & Deep ideas, cross‑feature modules, user‑interest modules, and a ResNet‑DNN output layer for faster convergence.

Sample Attribution

Positive‑sample delay (label lag) is mitigated using two strategies:

Negative‑sample cache (Facebook style): cache negative samples until a positive sample arrives; then replace the negative with the positive.

Sample correction (Twitter style): keep both negative and positive samples, updating the model with each.

In the home‑page live module, a pure cache approach yielded only 70 % label join rate due to screen‑off behavior. To improve accuracy, a sample‑accumulation step was added, raising label join to 97 %.

Model Hot‑Start and Restart

Incremental learning relies on a hot‑started offline model to:

Correct local pattern drift.

Refresh the vocabulary to avoid OOV issues.

Support sample‑accumulation workflows that require a stable base model.

Both daily offline model refreshes and 15‑minute incremental restarts are employed.

Feature Admission Strategies

To ensure high‑quality features:

Feature Freezing: time‑biased features (e.g., week, hour) are excluded from parameter updates to prevent over‑fitting to a single time slot.

Hard Admission: separate frequency thresholds for offline (large) and incremental (small) training; low‑frequency features are admitted only after reaching the incremental threshold, improving click‑through rate by 0.94 % and effective watch rate by 0.23 %.

Soft Admission: combines Poisson‑based frequency estimation and dynamic L1 regularization to smoothly adjust feature inclusion probability.

Case1: Time‑bias feature, e.g., week and hour. Incremental samples concentrate on one or two values, distribution differs from offline.
Case2: Low‑frequency unconfident feature, e.g., a host ID appears only 2 exposures, 1 click, 1 conversion; feeding it yields 50% click rate and 100% conversion rate.

Dynamic L1 regularization scales the L1 penalty based on feature frequency, yielding a 1.2 % lift in click‑through rate and a 1.1 % lift in effective watch rate in online A/B tests.

Online Results

Combining sample attribution, hot‑start, and feature admission, the home‑page live recommendation achieved a 5.219 % increase in conversion rate and a 6.575 % increase in click‑through rate over a 24‑day period. Faster model update frequencies (15 min vs. 2 h vs. daily) consistently delivered better and more stable performance.

Conclusion and Outlook

Live‑stream recommendation is a unique scenario where the item is a dynamic status, demanding ultra‑fast, high‑precision algorithms. This article presents the first end‑to‑end deployment of real‑time incremental learning for Cloud Music live streams, covering system architecture, model evolution, sample handling, and feature admission. Future work will focus on even faster, higher‑quality algorithms to improve user growth, monetization, and host development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

feature engineeringlive streamingModel DeploymentIncremental Learningindustry insightsreal-time recommendation
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.