Deep Ranking Model Evolution and Applications in Taobao Live: DBMTL, DMR, and RUI Ranking
This article presents a comprehensive overview of Taobao Live's deep ranking system evolution, detailing the DBMTL multi‑task learning framework, the two‑tower DMR matching‑ranking architecture, and the RUI Ranking refer‑item model, together with their offline formulas, online deployment scenarios, and measured performance gains across click‑through, watch‑time, and conversion metrics.
Introduction Taobao Live has continuously refined its sorting models over the past two years, applying multi‑objective learning, cross‑scene transfer, recall‑matching, and GMV optimization, as well as a unique modeling approach for full‑screen page up‑down scrolling.
DBMTL Overview DBMTL (Deep Bayesian Multi‑task Learning) extends traditional multi‑task learning by modeling sequential dependencies among objectives using a Bayesian network. The loss function is expressed as:
-L(x,H)=w1*log(P(t3|t1,t2,x,H))+w2*log(P(t2|t1,x,H))+w3*log(P(t1|x,H))
and the joint probability:
P(t1,t2,t3|x,H)=P(t3|t1,t2,x,H)*P(t2|t1,x,H)*P(t1|x,H)
DBMTL 1.0 Implemented a hard‑parameter‑sharing bottom‑layer and a Bayesian target‑target layer to capture temporal relationships among three objectives (t1→t2→t3). Online experiments showed improvements of +4.4% pCTR, +5.0% average watch time, and +2.9% follow‑rate.
DBMTL 2.0 Replaced hard sharing with soft‑parameter‑sharing via MMoE, enabling adaptive expert selection per task. This yielded additional gains of +2.6% pCTR, +2.8% uCTR, and +1.7% average watch time.
DBMTL 3.0 Introduced a multi‑scene multi‑task framework that shares bottom‑layer features across three live‑stream scenarios (homepage grid, dedicated app, and recommendation tab) while preserving scene‑specific objectives. Reported lifts of +12% pCTR in the recommendation tab, +2%–2.5% in other scenes.
DMR Overview DMR (Deep Match & Rank) adds a dedicated match tower that computes explicit user‑item similarity via dot‑product, combined with a rank tower handling multi‑objective logits. The two‑tower architecture enables end‑to‑end learning of matching and ranking.
DMR 1.0 Introduced the match tower and achieved notable lifts across all scenes (e.g., +6.1% pCTR in the grid channel).
DMR 2.0 Adopted a two‑stage learning pipeline: (1) pre‑train a recall model that learns multi‑modal user embeddings across various recall paths (U2F, U2A, etc.), and (2) feed these embeddings into the match and rank towers. This approach delivered further improvements such as +3.5% pCTR in the channel page.
RUI Ranking Overview RUI Ranking focuses on the full‑screen up‑down scrolling scenario, modeling the triplet relationship among user (u), refer (r), and item (i). The scoring formula is u2r * r2i + u2i , allowing the system to fall back to personalized user‑item relevance when the refer is not of interest.
RUI Ranking 1.0‑3.0 Version 1.0 introduced refer as an additional feature; version 2.0 replaced the user‑refer interaction with a dedicated refer‑item match tower; version 3.0 incorporated three small towers (u‑r, r‑i, u‑i) and fused them via the aforementioned formula. Across versions, average watch count increased by up to 3.9% and watch time by up to 5.8%.
Results and Impact Across all models, the multi‑objective and recall‑matching optimizations consistently improved key metrics: click‑through rate (pCTR), user‑click‑through rate (uCTR), average watch time, follow‑rate, and GMV during major events such as Double‑11. The frameworks have also been deployed to other Alibaba services (e.g., Lazada feeds, ICBU recommendations).
Conclusion From 2018 to 2024, Taobao Live’s ranking system evolved from basic multi‑task learning to sophisticated Bayesian multi‑task models, soft‑sharing MMoE, multi‑scene architectures, and two‑tower match‑rank designs, demonstrating the practical value of advanced AI techniques in large‑scale e‑commerce live streaming.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.