Artificial Intelligence 22 min read

Live Streaming Recommendation Ranking Model Evolution and Multi‑Objective Learning at Alibaba 1688

This article presents a comprehensive overview of Alibaba's 1688 live‑streaming recommendation system, detailing core challenges such as heterogeneous behavior modeling, multi‑objective optimization, and bias mitigation, and describing four successive model iterations—from feature‑engineered GBDT to attention‑based heterogeneous networks and transformer architectures—along with experimental results and practical insights.

DataFunTalk
DataFunTalk
DataFunTalk
Live Streaming Recommendation Ranking Model Evolution and Multi‑Objective Learning at Alibaba 1688

In recent years e‑commerce has become increasingly content‑driven, with live streaming and short video serving as major traffic sources; consequently, recommendation algorithms are now applied to live‑streaming scenarios. Using Alibaba's B‑class e‑commerce platform 1688 as a case study, the article discusses the core problems of live‑streaming recommendation, model evolution, multi‑objective learning, and debiasing.

Core Issues

Modeling heterogeneous user behaviors: sparse live‑stream interactions must be combined with rich product‑click data.

Multi‑objective learning: optimizing click‑through rate (CTR), conversion rate (CVR), stay time, interaction rate, and follow‑rate simultaneously.

Bias mitigation: position bias and selection bias distort model predictions and exacerbate the Matthew effect.

Model Iterations

V1 – Feature‑engineered GBDT point‑wise model : built on real‑time and historical live‑stream statistics, item, user, and cross features; weighted samples were used to balance CTR, CVR, and stay time, achieving +10% CVR and +30% stay time.

V2 – Heterogeneous behavior sequence Attention model : adopted the Embedding+MLP paradigm; introduced dual target‑attention structures for item‑sequence ↔ live‑stream product and live‑stream sequence ↔ live‑stream ID, leveraging YouTube DNN and DIN ideas; online CTR improved by 5%.

V3 – Heterogeneous Information Network (HIN) with Metapath2vec : constructed a graph of live‑stream, user, and product nodes; used Metapath2vec to generate node embeddings, which were fused and fed into the ranking model, yielding +2.5% CVR and +3.9% stay time in daily traffic and +10% conversion during promotions.

V4 – Live‑stream Transformer : modeled three token types (historical core products, real‑time products, user item sequence) with type and position embeddings; employed multi‑head self‑attention, feed‑forward, and pooling layers; online CTR increased by 4.8% with a modest 13 ms latency increase.

Multi‑Objective Learning

Baseline shared‑bottom architecture shares low‑level representations while each task (CTR, CVR, stay time) has its own MLP head, reducing parameters and enabling joint inference. The MMOE model extends this by assigning task‑specific gates to multiple expert networks, improving offline metrics and delivering a 0.5% average online lift across objectives.

Label processing includes log‑transforming stay‑time, and the overall loss is a weighted sum of individual task losses; careful weighting and techniques such as Pareto‑optimal LTR are discussed.

Debiasing Strategies

As‑Feature: treat position as an input feature, allowing the model to learn position bias; effective for reducing popularity bias but may hurt efficiency.

As‑Model (Biasnet): a separate bias network predicts position bias logits, which are summed with the main network output; online results show improved personalization and reduced Matthew effect with better efficiency.

Multi‑Tower model: builds separate CTR towers for each position (1‑9 and a 10+ tower), training each tower only on samples from its position; enables position‑aware predictions and supports greedy or list‑wise ranking, though it may introduce selection bias and suffers from limited recall in live‑stream scenarios.

Experimental results indicate that while the multi‑tower model raises CVR by 3.6% and stay time by 3%, personalization and bias metrics did not improve as expected due to data distribution skew, insufficient recall, and display bias from hot‑live indicators.

Conclusion

The final production system combines a transformer‑based heterogeneous interest layer for user‑item sequences, target‑attention for live‑stream sequences, GBDT‑encoded statistical features, a bias‑free CTR predictor (with Biasnet removed at inference), and an MMOE‑based multi‑objective head for post‑click metrics. This architecture delivers consistent gains in conversion, engagement, and fairness for Alibaba's 1688 live‑streaming recommendation platform.

Live Streamingtransformermulti-objective learningRecommendation systemsBias Mitigationheterogeneous graph
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.