Deep Learning-Based OCR Techniques at Meituan

Meituan’s OCR system replaces the classic preprocess‑segment‑recognize pipeline with deep‑learning components—CNN‑based text detection, synthetic‑data‑trained character models, and BLSTM‑CTC sequence recognition—delivering far higher accuracy on noisy, varied real‑world images such as menus, receipts, and IDs, though further integration with layout analysis remains needed.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Deep Learning-Based OCR Techniques at Meituan

Meituan applies AI across many services, and has built one of the largest, most complex real‑time intelligent delivery scheduling systems, a large‑scale voice interaction product, and a massive knowledge graph for food items, supporting hundreds of millions of users.

Starting a series “AI in Meituan”, this article (excerpt from chapter 15 of “Meituan Machine Learning Practice”) focuses on OCR (optical character recognition) in computer vision.

Background : Computer vision enables detection, recognition, and understanding of visual targets. In Meituan, OCR is used in order entry, menu display, receipt processing, and credential verification.

Traditional OCR pipeline : Image preprocessing → text line extraction (layout analysis, line segmentation) → character recognition. This pipeline works well for printed documents but struggles with photographed text due to complex imaging conditions, diverse fonts, and cluttered backgrounds.

Imaging complexity: noise, blur, lighting, deformation.

Text complexity: varied fonts, sizes, colors, wear.

Scene complexity: missing layout, background interference.

Traditional methods rely on binarization, connected‑component analysis, projection, and handcrafted features (e.g., Adaboost, SVM), which are brittle under the above challenges.

Improvements :

1. Text line extraction – two directions:

Bottom‑up generative methods (MSER, region proposals) followed by text/non‑text classification and merging.

Sliding‑window approaches using either traditional classifiers (Adaboost, Random Ferns) or deep CNNs.

2. From traditional single‑character engines to deep‑learning engines – convolutional neural networks (e.g., Maxout) replace hand‑crafted features; synthetic data generation covers fonts, deformations, blur, noise, and background variations.

3. Text line recognition – classic OCR separates character segmentation and recognition, leading to error propagation. Modern methods fall into:

Segmentation‑based: dynamic splitting with confidence‑guided merging (e.g., CNN‑based over‑segmentation, beam search).

Segmentation‑free: end‑to‑end sequence models that directly map image to character sequence.

4. Deep learning for text detection – scenes are divided into controlled (e.g., ID cards, bank cards) and uncontrolled (menus, storefronts). Controlled scenes use Faster R‑CNN for keyword detection; uncontrolled scenes use fully convolutional networks (FCN) for pixel‑level segmentation, followed by connected‑component clustering.

5. Sequence learning for text recognition – a bidirectional LSTM (BLSTM) processes CNN‑extracted features; a translation layer with CTC loss decodes the output sequence without explicit alignment. Synthetic and real samples are combined for training.

Experimental results show substantial accuracy gains over traditional OCR across diverse scenarios (menus, IDs, bank cards). However, further improvements are needed for specific document types, requiring tighter integration of deep‑learning detection with traditional layout analysis and richer language models.

References – a list of 23 papers covering text detection, object detection, face detection, and deep learning breakthroughs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionOCRtext detectionSequence Learning
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.