Applying Wide & Deep Learning to Meituan‑Dianping Recommendation System
This article describes how Meituan‑Dianping leverages deep learning, especially the Wide & Deep model, to improve its recommendation system by addressing business diversity, user context, feature engineering challenges, optimizer and loss function choices, and presents offline and online experimental results showing significant CTR gains.
1. Introduction
Since the breakthrough of deep learning in the 2012 ImageNet competition, it has become the most prominent technology in machine learning and AI. Meituan‑Dianping, the largest domestic lifestyle service platform, applies deep learning to various tasks such as text analysis, semantic matching, search ranking, OCR, image classification, and image quality ranking. This article shares the team’s experience of adapting Google’s 2016 Wide & Deep Learning ideas to the Meituan‑Dianping recommendation system.
2. Overview of the Recommendation System
Because Meituan‑Dianping’s business spans food, accommodation, travel, entertainment, etc., it is difficult to capture precise user interests or real‑time intent. The system faces challenges of diverse business forms and consumption scenarios. A two‑stage framework—recall and ranking—is used. The recall layer generates candidate items using multiple strategies (user‑based, model‑based, item‑based, query‑based, location‑based) and merges them, while the ranking layer re‑orders the merged candidates.
3. Deep Learning in the Ranking System
Simple linear or non‑linear models (e.g., LR, GBDT) have limitations: linear models cannot capture complex patterns, while GBDT does not consistently improve CTR in online A/B tests. Deep neural networks can automatically learn high‑order feature interactions, reducing the need for extensive feature engineering.
3.1 Existing Ranking Framework
The team experimented with various models and found that pure DNNs did not bring significant CTR improvement, especially for sparse user‑item interactions. Therefore, a Wide & Deep architecture was adopted, combining a wide linear part (using cross features for memorization) with a deep part (learning feature interactions for generalization).
3.2 Sample Selection
Positive samples are clicks, negative samples are non‑clicks, with a 1:9 positive‑negative ratio. Samples are weighted, cleaned, and noisy instances are removed. Features are divided into user profiles (gender, residence, price preference), item profiles (price, rating, location, etc.), and scene profiles (current location, time, context).
3.3 Feature Processing in Deep Learning
Combined Features : Cross features such as “merchant in user’s residence”, “user in residence”, and “distance” are discretized to capture interactions.
Normalization : Both Min‑Max scaling and CDF are evaluated; Min‑Max is preferred in production.
Fast Aggregation : For each continuous feature, two derived features (super‑linear and sub‑linear) are created to accelerate learning, though they are omitted in online experiments due to latency.
3.4 Optimizer Choice
The team compared several optimizers:
SGD : Simple but suffers from oscillation and local minima.
Momentum : Adds velocity to accelerate convergence and reduce oscillation.
Adagrad : Adapts learning rates per parameter, fast early training but may stop early.
Adam : Combines Momentum and Adagrad, provides stable and efficient training; chosen for the final model.
3.5 Loss Function Selection
Cross Entropy is preferred over Mean Squared Error because it avoids the saturation problem of the sigmoid derivative, leading to faster and more stable weight updates.
3.6 Wide & Deep Model Architecture
The final architecture uses Theano/TensorFlow Keras. Continuous features are Min‑Max normalized, important cross features are fed to both wide and deep components. The deep part consists of three ReLU layers followed by a sigmoid output. Training uses Adam optimizer, Cross Entropy loss, batch size 50,000, and 20 epochs on 70M training samples and 30M test samples.
4. Offline/Online Effects
Comparisons among pure DNN, Wide & Deep, and logistic regression show that the Wide & Deep model consistently improves AUC both offline and online. The online A/B test demonstrates higher CTR and more novel item recommendations, while mitigating the over‑recall of historically clicked distant items.
5. Conclusion
The Wide & Deep model successfully balances memorization (via the wide linear part) and generalization (via the deep neural network), leading to noticeable CTR improvements. Future work includes incorporating RNNs to capture temporal dynamics and exploring reinforcement learning for context‑aware recommendations.
6. References
H. Cheng, L. Koc, J. Harmsen et al., “Wide & Deep Learning for Recommender Systems”, ACM 2016. Link
P. Covington, J. Adams, E. Sargin, “Deep Neural Networks for YouTube Recommendations”, RecSys ’16. Link
H. Wang, N. Wang, D. Yeung, “Collaborative Deep Learning for Recommender Systems”.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.