Deep Learning Applications in Semantic Matching, Image Quality Ranking, and OCR at Meituan-Dianping
Meituan‑Dianping leverages deep‑learning models—including ClickNet for semantic search matching, an AlexNet‑based image‑quality ranker, and a Faster‑RCNN/FCN‑driven OCR pipeline—to personalize results, select attractive POI images, and extract text, achieving higher click‑through rates, conversions, and operational efficiency across its O2O services.
In recent years, deep learning has achieved remarkable results in speech, image, and natural language processing. Meituan-Dianping has explored its use in various scenarios, including semantic matching for search, image quality ranking for first‑image selection, and OCR for text extraction.
Semantic Matching
Semantic matching is crucial for information retrieval and search ranking. Beyond pure textual similarity, Meituan’s O2O platform also considers user intent and user state (e.g., location‑dependent queries). The solution introduces O2O‑specific features into a deep learning framework and uses click / order data to guide model optimization. The resulting ClickNet architecture is a lightweight model that balances effectiveness and efficiency and has been deployed in search, advertising, hotel, and travel ranking systems.
Representation Layer
Both query and merchant name are represented by semantic vectors obtained via DNN/CNN/RNN/LSTM/GRU, combined with business‑related features such as distance and merchant rating.
Learning Layer
Multiple fully‑connected layers predict a matching score, which is used together with labels to adjust the network. Training incorporates techniques for handling sample imbalance, importance weighting, and position bias.
Image Quality Ranking
Choosing the most attractive first image for a POI can significantly increase click‑through rates. Traditional aesthetic metrics (color, composition) are insufficient because user preferences are highly subjective. Meituan uses AlexNet to extract high‑level semantic features (beauty, memorability, attractiveness, category) and augments them with handcrafted low‑level features (color, sharpness, contrast, corner points). A shallow neural network then scores the image. Training data are collected from high‑CTR images in Meituan Deal albums (positive) and low‑CTR UGC images (negative), as well as category labels from the POI taxonomy.
OCR (Optical Character Recognition)
OCR is needed in many O2O workflows (payment, menu entry, credential verification). Challenges include complex imaging conditions, diverse fonts, and cluttered backgrounds. Traditional OCR pipelines (binarization, layout analysis, handcrafted edge features) struggle with these scenarios.
Meituan adopts a deep‑learning‑based OCR pipeline:
1. Text Localization
For controlled scenes (ID cards, licenses, bank cards), Faster R-CNN is used after simplifying the ZF backbone to three convolutional layers and adjusting anchor ratios. For uncontrolled scenes (menus, storefronts), a Fully Convolutional Network (FCN) provides pixel‑level text/background segmentation, merging shallow and deep deconvolution results.
2. Text Recognition
An end‑to‑end sequence learning framework (CNN → recurrent layers → translation layer) is employed. The CNN extracts visual features, the recurrent layers model character order, and the translation layer decodes the sequence. Training data combine real samples from Meituan’s business sources with synthetically generated images covering font variations, distortions, blur, noise, and background clutter.
These deep‑learning solutions have been deployed across Meituan‑Dianping’s search, advertising, hotel, and other O2O services, delivering measurable improvements in click‑through rate, conversion, and operational efficiency.
Conclusion
The article demonstrates how deep learning can be tailored to specific business scenarios in NLP, computer vision, and OCR, and outlines practical considerations such as feature engineering, data collection, model tuning, and system integration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
