Artificial Intelligence 13 min read

Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation

This article describes Laiye's OCR error‑correction system, detailing the background challenges of Chinese character recognition, the analysis of three possible solutions, the chosen post‑processing approach, model architecture, training data, loss design, online inference, and experimental results showing a measurable performance boost.

Laiye Technology Team
Laiye Technology Team
Laiye Technology Team
Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation

Background – Laiye Technology, a leader in hyper‑automation, has built a general OCR engine that achieves a near‑97% F1 score on over 60 Chinese test sets, yet still suffers from long‑tail errors such as stain interference, visually similar characters, and image deformation.

Typical error cases include red stamps or ink marks, characters that look alike (e.g., “戍” vs. “戌”), and distorted fonts caused by image warping.

Three generic remedies were examined: (1) injecting semantic information into the decoder (e.g., SRN), (2) multi‑task learning with masked visual augmentation, and (3) a decoupled post‑processing module. The third option was selected because synthetic data lacks semantic coherence and real‑world annotations do not provide precise character positions.

OCR model pipeline follows the common two‑stage design—text line detection followed by line‑level recognition—augmented with targeted pre‑ and post‑processing to reduce noise and produce cleaner outputs.

Error detection leverages the softmax probability p of each character; characters with p below a tuned threshold f are flagged as potential errors, eliminating the need for a separate detection model.

Correction recall uses a self‑supervised Mask Language Modeling (MLM) approach. Because most OCR errors involve a single wrong character, the model is trained to predict masked characters within sentences, using a non‑autoregressive six‑layer Transformer encoder (head_num=6, embedding_dim=128, max_seq_len=32). The top‑20 candidate characters for low‑probability positions are retrieved for further processing.

Correction ranking relies on character shape similarity. Three similarity measures were explored: (a) Chinese four‑corner coding, (b) image‑based similarity of rendered 128×128 glyphs (including AutoEncoder embeddings), and (c) OCR feature vectors derived from the final softmax matrix (shape N×D). Experiments showed the OCR feature‑vector method outperformed the others.

Training data consists of over 200 million sentences scraped from Wikipedia and the web, with random masking (10% random character replacement, 10% mask token) and a reduced vocabulary of the 3 900 most common Chinese characters. Numbers, English letters, punctuation, and high‑frequency quantifiers are replaced with OOV tokens to improve robustness.

Loss design combines standard cross‑entropy over all characters with a higher‑weight cross‑entropy on masked positions, masking out OOV characters and scaling loss by the number of valid tokens.

Online inference applies the probability threshold f to replace low‑confidence characters with OOV tokens, truncates sentences longer than 32 characters, and uses the top‑20 candidates for recall. The ranking module then selects the highest‑scoring candidate whose shape similarity exceeds a preset threshold.

Results – After deployment, the correction module added over 0.03% absolute F1 improvement on a 700 k‑character internal test set. Sample cases demonstrate successful correction of characters mis‑recognized due to small size, red‑stamp interference, and visual distortion.

References – The article cites works on semantic reasoning networks, center‑loss, R‑Drop, Soft‑Masked BERT, Transformer architecture, and related resources.

computer visionDeep LearningtransformerOCRerror correctionChinese Text
Laiye Technology Team
Written by

Laiye Technology Team

Official account of Laiye Technology, featuring its best tech innovations, practical implementations, and cutting‑edge industry insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.