Artificial Intelligence 11 min read

Learning at Serving Time (LAST): An Online Learning Approach for Real‑Time Re‑ranking in Recommendation Systems

This article introduces LAST, a novel online learning method that updates ranking models instantly at serving time without waiting for user feedback, addressing the latency and stability challenges of real‑time re‑ranking in industrial recommendation pipelines and demonstrating its superiority through offline and online experiments.

DataFunTalk

Aug 25, 2024

Learning at Serving Time (LAST): An Online Learning Approach for Real‑Time Re‑ranking in Recommendation Systems

Background : In typical industrial recommendation pipelines, re‑ranking is the final step that determines the ultimate result. Traditional online learning relies on delayed user feedback, which hampers real‑time model updates in e‑commerce scenarios where purchase decisions may take hours or days.

Re‑ranking Definition & Challenges : Re‑ranking refines candidate items by modeling context interactions (e.g., visual or size differences) and can be approached via point‑wise scoring or sequence generation. Challenges include feedback latency, engineering overhead for real‑time data streams, and model stability.

New Proposal – Learning at Serving Time (LAST) : LAST updates the model instantly when a new request arrives by computing a parameter offset Δθ that optimizes the reward for that specific request, without waiting for feedback. The offset is discarded after serving, keeping the deployed model unchanged.

Architecture : The system consists of a Generator that proposes candidate sequences and an Evaluator/Simulator that scores them. Two versions exist: a serial version that iteratively refines the sequence, and a parallel version that explores multiple Δθ candidates simultaneously and selects the highest‑scoring sequence.

Solution to Practical Problems :

Problem 1 – response‑time constraints are solved by a one‑shot generator that directly outputs the optimal sequence.

Problem 2 – lack of feedback for the observed sequence is mitigated by an Evaluator that predicts feedback for alternative sequences.

Problem 3 – combinatorial explosion of candidate permutations is handled by the parallel gradient‑exploration framework.

Experimental Results : Offline evaluations using NDCG and Evaluator‑based metrics show LAST outperforming baseline generator‑evaluator methods. Online A/B tests in Taobao’s “follow” feed demonstrate a two‑point increase in transaction volume while keeping click rates stable.

Conclusion : LAST provides a plug‑in, feedback‑free, real‑time optimization layer for re‑ranking, enabling per‑request model adaptation without affecting the deployed model. The work is a joint effort by Alibaba and Renmin University, with contributions from both academia and industry.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time machine learning online learning re‑ranking

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.