How LAST Enables Real‑Time Learning for Re‑Ranking in E‑Commerce Recommendations
This article presents LAST, a novel Learning-at-Serving-Time approach that enables real‑time online learning for re‑ranking in industrial recommendation pipelines, eliminating feedback latency, detailing its architecture, challenges, experimental validation, and practical advantages over traditional online learning methods.
Background: Real-time Online Learning & Re-ranking Model Basics
Introduce concepts of real-time online learning and re-ranking models in industrial recommendation pipelines. Traditional offline training takes days; online learning aims to update models instantly using new samples.
Challenges of Real-time Online Learning
Dependence on user feedback latency (hours to days).
Need for real-time data streaming infrastructure.
Model stability concerns.
Re-ranking Model Overview
Re-ranking is the final stage of a typical recommendation pipeline, refining candidate items (recall → coarse → fine → re-rank). It explicitly models context interactions among items.
Two modeling streams: pointwise scoring with context, and sequence generation without explicit rank scores.
Formal Introduction: Definition & Solutions
Define the re-ranking problem as selecting the optimal ordered subset of candidates (e.g., three candidates to output two items, six possible permutations). The objective is to maximize a reward function reflecting user, platform, and merchant feedback.
New Proposal: Learning at Serving Time (LAST)
LAST updates the model at the moment a request arrives, without waiting for user feedback. It computes a per‑request parameter shift Δθ to improve the immediate prediction, then discards it after serving.
Architecture
Classic online learning updates the deployed model θ after receiving feedback samples. LAST keeps the deployed model fixed and, for each request (u, c), searches for Δθ that maximizes the evaluator score.
Solution Steps
One‑shot solution: call a generator to produce a sequence, evaluate it, and adjust generation probabilities.
Introduce an evaluator that predicts feedback for alternative sequences, enabling training without real feedback.
Parallel version: generate multiple Δθ candidates via gradient exploration, evaluate each, and select the best sequence.
Experimental Results
Offline evaluation using NDCG shows that generator‑evaluator multi‑sequence methods outperform baselines, with LAST achieving the highest score. A second evaluator‑based offline test confirms a >1‑point gain over the strongest baseline.
Online A/B test in Taobao’s “follow” feed shows that LAST increases transaction volume by two points while keeping click volume stable.
Conclusion
LAST provides real‑time model updates without feedback latency, offers per‑request optimization, is plug‑in compatible, requires no additional online infrastructure, and has been validated both offline and online.
The work was a collaboration between Alibaba and Renmin University, involving researchers Wang Yuan, Li Zhiyu, Wen Zijian, Lin Quan, Zhang Changshuo, Chen Sirui, Zhang Xiao, and Xu Jun.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
