How Top E‑Commerce Platforms Rerank Recommendations: Models, Metrics, Practices
This article examines the role of reranking in modern recommendation pipelines, explains why context‑aware listwise models are needed, surveys the evolution from pointwise to generative and diversity‑aware approaches, and reviews real‑world deployments at companies such as Kuaishou, Alibaba, WeChat, iQIYI, and Meituan, highlighting key challenges, evaluation metrics, and business‑rule integrations.
Rerank Overview
Reranking is placed after the scoring stage in recommendation pipelines and determines the final displayed items. It takes the top‑N candidates from a precision‑ranking model, models listwise context to maximize overall utility, and outputs the top‑K sequence.
Why Rerank?
Precision ranking scores items independently, assuming higher scores imply higher value. This ignores contextual effects among items that can strongly influence user decisions. Reranking models the interactions within the displayed list to improve relevance, diversity, and user experience.
Example (Kuaishou short video): a geopolitical video followed by a light‑hearted clip feels disjointed, while inserting a high‑energy music video improves continuity and recommendation quality.
Key challenges for rerank modules include:
Modeling mutual influence among items rather than independent pairwise relations.
Defining "value" for different scenarios (GMV, watch time, diversity, user experience, etc.).
Balancing multiple business objectives (GMV, duration, diversity, discovery) and handling weighted or penalized cases.
Rapidly adapting to evolving user behavior.
Evaluation metrics align with business goals, e.g., NDCG for relevance and α‑NDCG for diversity.
Evolution of Rerank Models
Rerank models have progressed from pointwise to pairwise, listwise, generative, and diversity‑aware approaches.
Pointwise : Similar to classic CTR models (DNN, WDL, DeepFM) with real‑time updates and feature engineering.
Pairwise : Models such as GBRank, RankSVM, RankNet compare item pairs but ignore global list information and increase complexity.
Listwise : Capture whole‑list interactions using listwise loss functions; examples include LambdaMart, MIDNN, DLCM, PRM, SetRank, employing GBT, DNN, RNN, self‑attention, etc.
Generative : Generate the final list step‑by‑step (e.g., MIRNN, Seq2Slate) using RNNs or pointer networks.
Diversity : Methods such as DPP and MMR balance relevance and diversity.
Greedy Search Based Methods
Maximal Marginal Relevance (MMR)
[CMU SIGIR 1998] The Use of MMR, Diversity‑Based Reranking for Reordering Documents and Producing Summaries https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf
MMR balances relevance to the query and novelty relative to already selected items using a weight \(\lambda\). It selects the first item with highest query relevance, then iteratively picks items that maximize \(\lambda\times\text{relevance}-(1-\lambda)\times\text{similarity to selected items}\).
Determinantal Point Processes (DPP)
[Google CIKM 2018] Practical Diversified Recommendations on YouTube with Determinantal Point Processes [Hulu NIPS 2018] Fast Greedy MAP Inference for DPP to Improve Recommendation Diversity
DPP uses precision‑ranking scores and pairwise distances (e.g., Jaccard, EMD) to generate a top‑k list that optimizes diversity and efficiency. Deep variants replace the kernel with neural networks.
Context‑Aware Listwise Models
These models treat the top‑N items from the precision ranker as a context and model their mutual influence to produce the final top‑K list.
miRNN
[Alibaba IJCAI 2018] Globally Optimized Mutual Influence Aware Ranking in E‑Commerce Search https://arxiv.org/abs/1805.08524
miRNN employs an RNN to model sequential influence among items, estimates click probability for each candidate, and searches for the optimal top‑k sequence via beam search.
DLCM
[SIGIR 2018] Learning a Deep Listwise Context Model for Ranking Refinement https://arxiv.org/pdf/1804.05936
DLCM first obtains the top‑k list from a traditional LTR model, then processes the list backward with a GRU to preserve high‑scoring items, and finally re‑ranks using the GRU output.
Seq2Slate
[Google ICML 2019] Seq2Slate: Re‑ranking and Slate Optimization with RNNs https://arxiv.org/abs/1810.02019
Seq2Slate treats reranking as a sequence‑to‑sequence problem. A pointer network selects items one by one while continuously updating the context representation.
PRM
[Alibaba RecSys 2019] Personalized Re‑ranking for Recommendation https://arxiv.org/abs/1904.06813
PRM augments the precision‑ranking output with position encoding, feeds the sequence into a Transformer, and produces final scores via a fully‑connected layer followed by softmax.
EdgeRec
[Alibaba CIKM 2020] EdgeRec – Recommender System on Edge in Mobile Taobao https://arxiv.org/abs/2005.08416
EdgeRec runs the rerank model on the client device. It models heterogeneous user‑behavior sequences and uses target‑attention to adjust item order in real time, enabling low‑latency personalization.
Permutation‑Wise Rerank (PRS)
[Alibaba 2021] Revisit Recommender System in the Permutation Prospective https://arxiv.org/abs/2102.12057
PRS generates candidate permutations with a Fast Permutation Searching Algorithm (FPSA), predicts click‑through (pCTR) and continuation probabilities, and scores each permutation using a bi‑LSTM DPWN model. The permutation with the highest aggregated score is selected.
Industry Deployments
Production systems adopt these techniques at large scale:
Kuaishou short‑video : Multi‑objective listwise rerank using a Transformer and weighted log‑loss to improve DAU, watch time, and user feedback.
Alibaba Taobao : EdgeRec, PRS, and generator‑evaluator pipelines for real‑time sequence generation and business‑rule enforcement.
WeChat “Look” : Deep reinforcement learning for long‑term mixed ranking, modeling business actions as actions and clicks as rewards.
iQIYI search : Listwise rerank similar to PRM, evaluated with NDCG.
Meituan search : Transformer‑based listwise models and multi‑business mixed ranking with ESMM and multi‑tower architectures.
Meituan “to‑store” ads : Heterogeneous ad‑item mixing to address candidate explosion, interaction modeling, and cold‑start challenges.
Common architecture is a Generator‑Evaluator (GE) framework. The generator explores candidate sequences (beam search, MMR, Seq2Slate, sampling) and the evaluator scores them with supervised models. Business rules (traffic control, diversity, group ordering, forced insertion) are incorporated via masking or reward shaping. Training typically uses cross‑entropy loss for the evaluator and reinforcement‑style optimization for the generator to maximize a reward that combines evaluator scores and rule compliance.
References:
Multiple‑objective ranking in Kuaishou short‑video recommendation: https://www.infoq.cn/article/nozs4xy7bvbcf34vzhhu
Transformer in Meituan search ranking: https://tech.meituan.com/2020/04/16/transformer-in-meituan.html
Edge‑search rerank in Dianping: https://tech.meituan.com/2022/06/16/edge-search-rerank.html
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
