Edge AI Boosts Mobile Search Ranking: Inside Meituan’s On‑Device Re‑ranking
This article details Meituan’s implementation of on‑device deep learning models for search re‑ranking, covering the motivations for edge intelligence, feature engineering, feedback sequence modeling, model architecture, deployment optimizations, experimental results, and future directions, offering practical insights for developers building large‑scale AI on mobile.
1 Introduction
Edge intelligence runs AI models directly on mobile devices, addressing privacy and low‑latency requirements that cloud‑only solutions cannot satisfy. Deploying ranking models on smartphones enables local feature extraction, reduces network traffic, and allows real‑time personalization.
2 Why On‑Device Re‑ranking?
2.1 Limitations of Cloud‑Side Ranking
Typical search pipelines consist of query understanding, multi‑way recall, model ranking and result merging on the server. Because of QPS limits and pagination, the client receives a fixed number of results per page (e.g., 25 items). This creates two problems:
Result‑update latency : User actions cannot affect the current page until the next pagination request, causing delayed personalization.
Real‑time feedback latency : Feedback signals processed by batch stream processors (Storm/Flink) incur minute‑level delays, reducing the usefulness of immediate user preferences.
2.2 Benefits of Edge Re‑ranking
In‑page re‑ranking : The device can reorder results instantly based on the latest user actions, eliminating pagination‑driven delays.
No feedback latency : Feedback is consumed locally, removing cloud‑side batch delays.
Privacy protection : All user data stays on the device, complying with personal‑information regulations.
After deployment in the Dianping app, click‑through rate (CTR) on the main search flow increased by 25 bp, and the food‑channel page saw a 43 bp uplift, with an average query‑click increase of 0.29 %.
3 On‑Device Re‑ranking Algorithm Exploration
3.1 Feature Engineering
The on‑device feature pipeline mirrors the cloud ranking system, reusing User, Item, Query and Contextual base and cross features while optimizing for transmission and storage. In addition, real‑time feedback signals (e.g., exposure, click, dwell time) are added.
Basic features: user, shop, query, context and their cross features.
Bias features: ranking position, visual size bias, etc.
Real‑time feedback features:
User interaction sequence (exposures, clicks).
Behavior‑related features such as dwell time on shop detail pages.
3.2 User Feedback Sequence Modeling
Standard sequence models (DIN, DIEN, BST) were evaluated. To better exploit on‑device feedback, a Deep Feedback Network (DFN) was introduced. DFN splits the feedback stream into a positive (click) sequence and a negative (exposure‑no‑click) sequence, then applies cross‑attention:
Exposure sequence is used as Query, click sequence as Key/Value → attention from exposure to click.
Click sequence as Query, exposure as Key/Value → reverse attention.
When only negative feedback is present, a zero vector is appended to the negative sequence to suppress noise (following the zero‑attention trick).
To improve the signal‑to‑noise ratio of negative feedback, exposure time was limited: long‑duration exposures without clicks are treated as stronger negative signals. This yielded a more stable uplift in online experiments.
A Multi‑View Feedback Attention Network (MVFAN) was further proposed. MVFAN enriches each feedback item with additional attributes (category, price, distance, etc.) and performs multi‑head cross‑attention across these views, allowing the model to capture fine‑grained user preferences.
3.3 Re‑ranking Model Design
A context‑aware list‑wise model is employed. The model receives the top‑N candidates from a coarse ranking layer and jointly encodes their shop context using a Transformer. Cloud‑side features are incorporated via joint training, enabling the on‑device model to benefit from rich offline signals while keeping inference lightweight.
3.4 Multi‑Scenario Effectiveness
Offline experiments (see Table 2) show consistent gains across metrics. Online A/B tests on the main search page and the food‑channel list page reported QV_CTR lifts of 0.25 % and 0.43 % respectively, with pronounced improvements on lower‑page positions, confirming that on‑device re‑ranking mitigates pagination‑induced decay.
4 System Architecture and Deployment Optimizations
4.1 Architecture Overview
The edge re‑ranking system consists of three core modules:
Intelligent trigger module that decides when to invoke on‑device ranking (e.g., after a user clicks a shop).
On‑device re‑ranking service that builds features and runs inference using a lightweight inference engine.
Native post‑processing module that merges the re‑ranked list back into the UI.
4.2 Large‑Scale Model Deployment on Device
Because mobile storage is limited, the model is split into:
Dense network : Converted to MNN format and stored locally (≈10 MB after splitting).
ID‑embedding table : Contains ~80 % of parameters and is served from a cloud TensorFlow‑Serving endpoint. At request time the client fetches only the embeddings required for the current page, concatenates them with locally computed features, and performs inference.
4.3 Model Compression
After splitting, the model size is <10 MB. A Meituan‑developed compression tool further reduces the footprint to <1 MB with less than 0.001 % accuracy loss. Power‑consumption tests (10‑minute repeated searches) show negligible impact.
4.4 Edge Model Training & Estimation Platform
An end‑to‑end platform integrates the Augur feature‑processing framework, the Poker experiment system, and a unified estimation engine (also named Augur). This pipeline enables rapid model iteration, feature rollout, and unified offline/online evaluation for edge scenarios.
5 Conclusion and Outlook
Edge re‑ranking in the Dianping app demonstrates that on‑device AI can substantially improve search relevance, reduce latency, and protect user privacy. Future work includes:
Federated learning to jointly train cloud‑edge models while preserving data privacy.
More sophisticated trigger strategies that incorporate query context and real‑time feedback.
Robust modeling of implicit negative feedback (e.g., improved encoding of long‑duration exposures).
Personalized on‑device models for a “one‑model‑per‑user” experience.
References
[1] Yu Gong, Ziwen Jiang, et al. “EdgeRec: Recommender System on Edge in Mobile Taobao”, arXiv:2005.08416 (2020).
[2] Qingyao Ai, Keping Bi, et al. “Learning a Deep Listwise Context Model for Ranking Refinement”, arXiv:1804.05936 (2018).
[3] Changhua Pei, Yi Zhang, et al. “Personalized Re‑ranking for Recommendation”, arXiv:1904.06813 (2019).
[4] Ruobing Xie, Cheng Ling, et al. “Deep Feedback Network for Recommendation”, IJCAI‑2020.
[5] 非易、祝升等. 大众点评搜索基于知识图谱的深度学习排序实践.
[6] 肖垚、家琪等. Transformer 在美团搜索排序中的实践.
[7] Qingyao Ai, Daniel N Hill, et al. “A zero attention model for personalized product search”, arXiv:1908.11322 (2019).
[8] Teo CH, Nassif H, et al. “Adaptive, Personalized Diversity for Visual Discovery”, RecSys‑2016.
[9] Eugene Ie, Vihan Jain, et al. “SLATEQ – A Tractable Decomposition for Reinforcement Learning with Recommendation Sets”, IJCAI‑19.
[10] Zhou, Guorui, et al. “Deep interest network for click‑through rate prediction.” KDD‑2018.
[11] Zhou, Guorui, et al. “Deep interest evolution network for click‑through rate prediction.” AAAI‑2019.
[12] Chen, Qiwei, et al. “Behavior Sequence Transformer for E‑commerce Recommendation in Alibaba.” arXiv:1905.06874 (2019).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
