Real‑time Positive/Negative Feedback Sequence Modeling and Multi‑objective Optimization for Taobao Live Ranking
This article presents a practical study on modeling real‑time positive and negative feedback sequences and applying multi‑objective optimization in the re‑ranking stage of Taobao Live, detailing system architecture, feature engineering, loss design, experimental results, and future research directions.
This article shares practical experience of modeling positive and negative feedback sequences and applying multi‑objective optimization to the ranking of Taobao Live streams.
Background : Live‑stream e‑commerce combines content and shopping, and Taobao Live has grown rapidly, achieving over 5 trillion CNY GMV in 2021 with a year‑over‑year growth exceeding 90 %.
Problem : The key challenge is efficiently delivering real‑time interactive content to the right audience. In the full‑screen page scenario, after a heavy precision‑ranking stage, the system must quickly perceive user and host real‑time behaviors and make re‑ranking decisions.
Requirements for the re‑ranking model :
Real‑time perception of user watch‑interaction and host live‑room status.
Utilization of real‑time positive/negative feedback to adjust recommendations.
Multi‑objective optimization to improve content metrics (stay time, likes, comments, shares) while also boosting e‑commerce metrics (item clicks, purchases, GMV).
Model architecture : The core structure (see Figure 1) consists of an embedding layer, user‑host interest expression layers, and a multi‑objective learning layer. The model processes real‑time features from both sides and employs a point‑wise design for efficiency.
Real‑time features :
User side : Two real‑time feedback sequences (positive and negative) are generated by streaming live messages to Flink, filtered by high‑pass (HPF) and low‑pass (LPF) filters, and stored in an igraph table.
Host side : Real‑time statistics such as UV, likes, follows, comments, shares, and item‑click counts are aggregated by Flink, together with top‑N product information displayed by the host.
Positive/negative feedback modeling : A contrastive residual learning approach extracts user interest from the positive and negative sequences, applies a target‑attention mechanism (based on Transformer) to the candidate host, and computes a residual between the two attentions. Metric‑learning is introduced via a label‑aware triplet loss to enhance discrimination between positive and negative sequences.
Multi‑objective learning : To balance multiple business goals, a PLE (Progressive Layered Extraction) framework is adopted, combining task‑specific experts with shared experts. Interaction signals (likes, shares, comments, follows) are merged into an "interact" label, while purchase‑related signals (add‑to‑cart, purchase, GMV) form a "buy" label, reducing sparsity and model size.
Online results : A/B testing in the Taobao Live environment shows overall gains: average stay time +4 %, likes +5.2 %, comments +4 %, follows +6 %, shares +5 %, and purchase conversion +20 %.
Conclusion & Outlook : The real‑time feedback pipeline and contrastive residual learning improve user experience, while the PLE multi‑objective framework balances content and commerce metrics. Future work includes deeper user interest exploration and reinforcement‑learning‑based re‑ranking for even stronger real‑time personalization.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.