Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search
This article describes Meituan Search's pre‑ranking (coarse‑ranking) system evolution and presents two major optimization strategies—leveraging knowledge distillation to align coarse‑ranking with fine‑ranking and employing neural architecture search to jointly improve effectiveness and latency—demonstrating significant offline and online performance gains.
In large‑scale industrial search, recommendation, and advertising systems, a cascade ranking architecture is commonly used to balance efficiency and effectiveness. Meituan Search’s ranking pipeline consists of a coarse‑ranking (pre‑ranking) stage followed by fine‑ranking, re‑ranking, and mixing stages. The coarse‑ranking must filter thousands of candidates down to a few hundred for the fine‑ranking.
The coarse‑ranking faces three main challenges: sample selection bias due to the gap between offline training data and online prediction data, the need for coarse‑ranking and fine‑ranking interaction (co‑ranking‑fine‑ranking linkage), and strict latency constraints.
2. Evolution of Coarse‑Ranking
From 2016 to the present, Meituan’s coarse‑ranking progressed from simple linear weighting of relevance, quality, and conversion features, to a pointwise logistic regression model, then to a dual‑tower vector inner‑product model, followed by a GBDT model that combined dual‑tower outputs with cross features, and finally to end‑to‑end neural network models starting in 2020.
3. Optimization Practices
3.1 Coarse‑Ranking‑Fine‑Ranking Linkage via Knowledge Distillation
Because coarse‑ranking models are simpler and use fewer features than fine‑ranking models, their effectiveness lags. To mitigate this, three distillation schemes were explored:
Result List Distillation : augment coarse‑ranking training data with positive and negative samples derived from fine‑ranking results, improving offline Recall@150 by up to +5 PP and online CTR by +0.1%.
Score Distillation : align coarse‑ranking scores with fine‑ranking scores using a mean‑squared error loss weighted by a λ parameter, yielding +5 PP Recall@150 and +0.05% CTR.
Feature Representation Distillation : apply contrastive learning (InfoNCE loss) to transfer fine‑ranking representations to the coarse‑ranking model, achieving +14 PP Recall@150 and +0.15% CTR.
3.2 Joint Effectiveness‑Efficiency Optimization via Neural Architecture Search (NAS)
To respect latency limits while enriching features, a differentiable NAS framework based on ProxylessNAS was adopted. Feature masks (Bernoulli‑distributed) and MixOp architecture choices were learned jointly with model parameters. Efficiency modeling incorporated both feature retrieval latency and model structural latency, enabling a multi‑objective loss that balances ranking loss, distillation loss, and latency penalties. The resulting model improved offline Recall@150 by +11 PP and online CTR by +0.12% without increasing latency.
4. Summary
Since 2020, Meituan has deployed MLP‑based coarse‑ranking models and iteratively enhanced them through knowledge distillation (result list, score, and representation), contrastive representation transfer, and multi‑objective NAS for joint effectiveness‑efficiency gains.
Future work includes multi‑objective modeling for coarse‑ranking, dynamic system‑wide compute allocation, and further exploration of feature‑aware architecture search.
5. Appendix
Recall is used as the primary offline metric for coarse‑ranking, measuring the overlap between top‑K coarse‑ranking and fine‑ranking results.
6. Authors
Xiaojiang, Sugui, Lixiang, Caoyue, Peihau, Xiaoyao, Dayao, Chensheng, Yunsen, and Liqian from Meituan Platform/Search Recommendation Algorithm Department.
7. References
[1] Wang Z, Zhao L, Jiang B, et al. Cold: Towards the next generation of pre‑ranking system. arXiv:2007.16122, 2020. [2] Ma X, Wang P, Zhao H, et al. Towards a Better Tradeoff between Effectiveness and Efficiency in Pre‑Ranking. SIGIR 2021. [3] Tencent Music: Pre‑ranking design. [4] iQIYI short video recommendation: Pre‑ranking. [5] Transformer in Meituan Search ranking. [6] Multi‑business modeling in Meituan Search. [7] Tang J, Wang K. Ranking distillation. KDD 2018. [8] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. [9] Chen L et al. Wasserstein contrastive representation distillation. CVPR 2021. [10] Cao Y et al. Contrastive Information Transfer for Pre‑Ranking Systems. 2022. [11] Liu Y et al. Search to distill. CVPR 2020. [12] Cai H et al. ProxylessNAS. arXiv:1812.00332, 2018. [13] Li X et al. AutoFAS: Automatic Feature and Architecture Selection for Pre‑Ranking System. 2022.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.