Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System
This article details the technical evolution of Kuaishou's short‑video recommendation pipeline, focusing on sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking, and explains how transformer‑based models, generator‑evaluator frameworks, and reinforcement‑learning strategies are employed to maximize overall sequence value, user engagement, and revenue.
Kuaishou operates a large‑scale short‑video and live‑streaming platform with diverse business scenarios, generating massive interaction data that creates complex recommendation challenges such as large‑scale estimation, reinforcement learning, and causal analysis.
The presentation is organized into four parts: an overview of Kuaishou's recommendation scenario, sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking.
Sequence Re‑ranking addresses the fact that a sequence's overall value is not merely the sum of individual item scores; context and ordering heavily influence user behavior. Traditional point‑wise scoring, greedy shuffling, and MMR/DPP methods have limitations, prompting a shift to transformer or LSTM models that embed upstream content, an optimization objective focused on the whole sequence, and continuous discovery of effective ordering patterns.
The system adopts a generator‑evaluator paradigm: a generator creates diverse candidate sequences from the top‑50 items, and an evaluator (a unidirectional transformer followed by an auxiliary embedding model) predicts the overall sequence score, achieving significant online gains.
Various sequence generation strategies are discussed, including beam search, multi‑queue weighting that approximates a Pareto frontier, listwise mixing, and reinforcement‑learning (Duel DQN) approaches that balance long‑term user experience with short‑term revenue.
Multi‑Content Mixing aims to combine results from different business streams into a single sequence that maximizes overall social value while respecting diversity constraints, moving beyond simple scoring‑based ordering.
On‑Device Re‑ranking tackles real‑time perception, immediate feedback, personalized ("thousand‑users‑thousand‑models") modeling, and compute allocation. By incorporating real‑time user signals (e.g., volume, orientation) and lightweight transformer interactions, the on‑device model improves CTR by 2.53 pp, LTR by 4.81 pp, and WTR by 1.36 pp.
The talk concludes with a Q&A covering evaluation of generated sequences, personalization strategies, and diversity requirements, emphasizing the importance of causal and contextual coherence in recommendation sequences.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.