Artificial Intelligence 22 min read

Applying Wide & Deep Learning to Meituan‑Dianping Recommendation System

This article describes how Meituan‑Dianping leverages deep learning, especially the Wide & Deep model, to improve its recommendation system by addressing business diversity, user context, feature engineering challenges, optimizer and loss function choices, and presents offline and online experimental results showing significant CTR gains.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Applying Wide & Deep Learning to Meituan‑Dianping Recommendation System

1. Introduction

Since the breakthrough of deep learning in the 2012 ImageNet competition, it has become the most prominent technology in machine learning and AI. Meituan‑Dianping, the largest domestic lifestyle service platform, applies deep learning to various tasks such as text analysis, semantic matching, search ranking, OCR, image classification, and image quality ranking. This article shares the team’s experience of adapting Google’s 2016 Wide & Deep Learning ideas to the Meituan‑Dianping recommendation system.

2. Overview of the Recommendation System

Because Meituan‑Dianping’s business spans food, accommodation, travel, entertainment, etc., it is difficult to capture precise user interests or real‑time intent. The system faces challenges of diverse business forms and consumption scenarios. A two‑stage framework—recall and ranking—is used. The recall layer generates candidate items using multiple strategies (user‑based, model‑based, item‑based, query‑based, location‑based) and merges them, while the ranking layer re‑orders the merged candidates.

3. Deep Learning in the Ranking System

Simple linear or non‑linear models (e.g., LR, GBDT) have limitations: linear models cannot capture complex patterns, while GBDT does not consistently improve CTR in online A/B tests. Deep neural networks can automatically learn high‑order feature interactions, reducing the need for extensive feature engineering.

3.1 Existing Ranking Framework

The team experimented with various models and found that pure DNNs did not bring significant CTR improvement, especially for sparse user‑item interactions. Therefore, a Wide & Deep architecture was adopted, combining a wide linear part (using cross features for memorization) with a deep part (learning feature interactions for generalization).

3.2 Sample Selection

Positive samples are clicks, negative samples are non‑clicks, with a 1:9 positive‑negative ratio. Samples are weighted, cleaned, and noisy instances are removed. Features are divided into user profiles (gender, residence, price preference), item profiles (price, rating, location, etc.), and scene profiles (current location, time, context).

3.3 Feature Processing in Deep Learning

Combined Features : Cross features such as “merchant in user’s residence”, “user in residence”, and “distance” are discretized to capture interactions.

Normalization : Both Min‑Max scaling and CDF are evaluated; Min‑Max is preferred in production.

Fast Aggregation : For each continuous feature, two derived features (super‑linear and sub‑linear) are created to accelerate learning, though they are omitted in online experiments due to latency.

3.4 Optimizer Choice

The team compared several optimizers:

SGD : Simple but suffers from oscillation and local minima.

Momentum : Adds velocity to accelerate convergence and reduce oscillation.

Adagrad : Adapts learning rates per parameter, fast early training but may stop early.

Adam : Combines Momentum and Adagrad, provides stable and efficient training; chosen for the final model.

3.5 Loss Function Selection

Cross Entropy is preferred over Mean Squared Error because it avoids the saturation problem of the sigmoid derivative, leading to faster and more stable weight updates.

3.6 Wide & Deep Model Architecture

The final architecture uses Theano/TensorFlow Keras. Continuous features are Min‑Max normalized, important cross features are fed to both wide and deep components. The deep part consists of three ReLU layers followed by a sigmoid output. Training uses Adam optimizer, Cross Entropy loss, batch size 50,000, and 20 epochs on 70M training samples and 30M test samples.

4. Offline/Online Effects

Comparisons among pure DNN, Wide & Deep, and logistic regression show that the Wide & Deep model consistently improves AUC both offline and online. The online A/B test demonstrates higher CTR and more novel item recommendations, while mitigating the over‑recall of historically clicked distant items.

5. Conclusion

The Wide & Deep model successfully balances memorization (via the wide linear part) and generalization (via the deep neural network), leading to noticeable CTR improvements. Future work includes incorporating RNNs to capture temporal dynamics and exploring reinforcement learning for context‑aware recommendations.

6. References

H. Cheng, L. Koc, J. Harmsen et al., “Wide & Deep Learning for Recommender Systems”, ACM 2016. Link

P. Covington, J. Adams, E. Sargin, “Deep Neural Networks for YouTube Recommendations”, RecSys ’16. Link

H. Wang, N. Wang, D. Yeung, “Collaborative Deep Learning for Recommender Systems”.

machine learningFeature Engineeringrecommendationctrdeep learningoptimizerwide & deep
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.