How Huajiao Live Built a From‑Scratch Personalized Recommendation System

This article analyzes Huajiao Live's end‑to‑end recommendation pipeline, covering basic concepts, recall and ranking algorithms—including collaborative filtering, matrix factorization, deep learning models—and multi‑objective optimization, while detailing the engineering workflow for training, deployment, and real‑time serving in a live‑streaming environment.

Huajiao Technology
Huajiao Technology
Huajiao Technology
How Huajiao Live Built a From‑Scratch Personalized Recommendation System

Introduction

Live‑streaming platforms need to recommend relevant streams from massive content pools. A typical recommendation pipeline consists of three stages: Recall (filter millions of items to a few thousand using low‑cost models), Feature‑based Ranking (refine to hundreds with richer features), and Final Ranking (select the top items for display).

Recommendation pipeline
Recommendation pipeline

Recall Algorithms

Domain‑based Collaborative Filtering

Item‑based collaborative filtering builds a similarity matrix between streamers (items). Because the number of streamers is far smaller than users, the matrix is tractable (O(n²)). It provides interpretable recommendations and fast cold‑start for new users, but it does not learn from an optimization objective and can be memory‑intensive.

Item‑based CF
Item‑based CF

Latent‑Factor Collaborative Filtering

Matrix Factorization (MF) decomposes the user‑item interaction matrix into low‑dimensional user matrix X and item matrix Y. Implicit feedback (e.g., watch time, click count) is binarized (1 if above a threshold, else 0) and weighted by confidence in the loss function. Offline training yields X and Y; online inference computes the dot product X_i·Y_j for any user‑item pair.

Advantages: simple, fast online, low storage. Limitations: limited expressiveness and difficulty handling very sparse data.

MF training
MF training

Neural Collaborative Filtering (NCF)

NCF replaces the inner product with a deep neural network that learns a non‑linear interaction function from one‑hot encoded user and item IDs. The model consists of an embedding layer followed by multiple fully‑connected layers.

NCF architecture
NCF architecture

NeuMF

NeuMF combines a Generalized Matrix Factorization (GMF) branch with a Multi‑Layer Perceptron (MLP) branch, capturing both linear and non‑linear feature interactions.

NeuMF
NeuMF

Ranking Algorithms

Feature Engineering

Effective ranking relies on diverse features, including:

User demographics and profile

Contextual signals (time of day, device)

Historical behavior (clicks, watch time, gifts, comments)

Real‑time metrics (current view count, live gifts, chat activity)

Feature categories
Feature categories

Model Choices

Logistic Regression (LR) : linear model, fast training, limited to linear interactions.

Factorization Machines (FM) : adds second‑order feature cross terms automatically; can handle sparse high‑dimensional data.

GBDT + LR : Gradient Boosted Decision Trees generate high‑order feature combinations; the leaf indices are fed to a linear model.

GBDT+LR architecture
GBDT+LR architecture

Deep Models

Wide & Deep, DeepFM and DIN integrate FM‑style cross features with deep neural networks. DIN introduces an attention mechanism to weight item embeddings based on user behavior sequences.

DeepFM
DeepFM

Multi‑Objective Optimization

Live‑streaming recommendation often optimizes several metrics simultaneously (click‑through, watch time, gifts, comments, follows, shares). Multi‑task models share a common embedding layer and add task‑specific towers.

ESMM (Entire Space Multi‑Task Model) : predicts click‑through rate (CTR) and conversion‑rate (CVR) jointly, introducing pCTCVR (probability of conversion given click) to mitigate sample‑selection bias.

MMoE (Multi‑Gate Mixture‑of‑Experts) : splits the shared bottom layer into multiple expert sub‑networks and learns a gating function for each task, allowing flexible sharing.

MMoE architecture
MMoE architecture

Model Training and Deployment

Data collection gathers user‑streamer interactions and stores full‑snapshot samples in HDFS. Offline training runs daily or weekly on the entire dataset; incremental updates are processed via Flink streams. Trained models are served with TensorFlow Serving. A Go‑based recommendation service transforms incoming requests into feature vectors and calls the TensorFlow Serving endpoint for inference.

Training and serving architecture
Training and serving architecture

Conclusion

There is no universally “best” model for live‑streaming recommendation. The optimal solution depends on domain characteristics such as multimodal content, real‑time dynamics, and hotspot effects. Understanding the scene, extracting representative features, and selecting models that align with those features are essential for achieving superior recommendation performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

live streamingrecommendationAIDeep Learningmulti-task learningcollaborative filteringrecommender system
Huajiao Technology
Written by

Huajiao Technology

The Huajiao Technology channel shares the latest Huajiao app tech on an irregular basis, offering a learning and exchange platform for tech enthusiasts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.