Artificial Intelligence 11 min read

Learning at Serving Time (LAST): An Online Learning Approach for Real‑Time Re‑ranking in Recommendation Systems

This article introduces LAST, a novel online learning method that updates ranking models instantly at serving time without waiting for user feedback, addressing the latency and stability challenges of real‑time re‑ranking in industrial recommendation pipelines and demonstrating its superiority through offline and online experiments.

DataFunTalk
DataFunTalk
DataFunTalk
Learning at Serving Time (LAST): An Online Learning Approach for Real‑Time Re‑ranking in Recommendation Systems

Background : In typical industrial recommendation pipelines, re‑ranking is the final step that determines the ultimate result. Traditional online learning relies on delayed user feedback, which hampers real‑time model updates in e‑commerce scenarios where purchase decisions may take hours or days.

Re‑ranking Definition & Challenges : Re‑ranking refines candidate items by modeling context interactions (e.g., visual or size differences) and can be approached via point‑wise scoring or sequence generation. Challenges include feedback latency, engineering overhead for real‑time data streams, and model stability.

New Proposal – Learning at Serving Time (LAST) : LAST updates the model instantly when a new request arrives by computing a parameter offset Δθ that optimizes the reward for that specific request, without waiting for feedback. The offset is discarded after serving, keeping the deployed model unchanged.

Architecture : The system consists of a Generator that proposes candidate sequences and an Evaluator/Simulator that scores them. Two versions exist: a serial version that iteratively refines the sequence, and a parallel version that explores multiple Δθ candidates simultaneously and selects the highest‑scoring sequence.

Solution to Practical Problems : Problem 1 – response‑time constraints are solved by a one‑shot generator that directly outputs the optimal sequence. Problem 2 – lack of feedback for the observed sequence is mitigated by an Evaluator that predicts feedback for alternative sequences. Problem 3 – combinatorial explosion of candidate permutations is handled by the parallel gradient‑exploration framework.

Experimental Results : Offline evaluations using NDCG and Evaluator‑based metrics show LAST outperforming baseline generator‑evaluator methods. Online A/B tests in Taobao’s “follow” feed demonstrate a two‑point increase in transaction volume while keeping click rates stable.

Conclusion : LAST provides a plug‑in, feedback‑free, real‑time optimization layer for re‑ranking, enabling per‑request model adaptation without affecting the deployed model. The work is a joint effort by Alibaba and Renmin University, with contributions from both academia and industry.

real-timemachine learningRecommendation systemsre-ranking
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.