How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.
Overview
PP‑OCRv2 is an upgraded version of the PP‑OCR algorithm in the PaddleOCR open‑source OCR framework. It introduces five major technical improvements that increase accuracy, inference speed and keep model size compact (≈11.6 MB).
Key Technical Improvements
Collaborative Mutual Learning (CML) knowledge distillation for detection. A teacher network (ResNet‑18) guides two student networks (MobileNetV3) while the students also learn from each other via mutual learning. Three loss terms are used: GT loss, DML loss and distillation loss.
Copy‑Paste data augmentation for detection. Two training images are randomly selected, scaled, optionally flipped, and a subset of text objects from one image is pasted onto random locations of the other image, increasing the diversity of positive samples.
LCNet lightweight backbone for recognition. Based on MobileNetV1, LCNet replaces ReLU with h‑swish (except in SE modules), expands the depth‑wise convolution kernel to 5×5 in the fifth stage, adds SE modules to the last two SEP blocks, and inserts a 1280‑dim fully‑connected layer after global average pooling. These changes yield 1‑3 % accuracy gains with minimal parameter increase.
Enhanced UDML (Unified Distillation with Mutual Learning) for recognition. In addition to standard DML, a feature‑level loss is introduced and an extra fully‑connected head is added, accelerating distillation and improving final accuracy.
Enhanced CTC loss with Center loss for Chinese character recognition. Center loss, borrowed from metric learning, enlarges inter‑class distances, reducing confusion among visually similar Chinese glyphs.
Performance Impact
Relative to the original PP‑OCR mobile model, overall OCR accuracy improves by >7 %.
Inference speed is increased by more than 220 % compared with the PP‑OCR server model.
The total model size remains low at 11.6 MB, enabling deployment on both server and mobile platforms.
Implementation Details
CML Knowledge Distillation
# Teacher backbone
ResNet18
# Student backbones
MobileNetV3 (two instances)
# Losses
GT_Loss = CrossEntropy(y_true, y_pred)
DML_Loss = KLDiv(student1 || student2) + KLDiv(student2 || student1)
Distill_Loss = KLDiv(teacher || student_i)
Total_Loss = GT_Loss + λ1*DML_Loss + λ2*Distill_LossCopy‑Paste Augmentation Pipeline
1. Randomly pick image A and image B.
2. Apply random scaling and horizontal flip to both.
3. Sample a subset of text objects from image A.
4. Paste the sampled objects onto random positions in image B.
5. Use the augmented pair for training the detection head.LCNet Backbone Modifications
- Replace ReLU → h‑swish (except inside SE blocks)
- Depth‑wise conv kernel in stage‑5: 5×5
- Add SE modules to the last two SEP blocks
- After GAP, add FC(1280) before the classification headUDML with Feature Loss
Feature_Loss = L2(Feature_teacher, Feature_student)
Total_Recognition_Loss = CTC_Loss + α*Feature_Loss + β*FC_Head_LossEnhanced CTC + Center Loss
CTC_Loss = -log p(sequence|input)
Center_Loss = Σ_i ||f_i - c_{y_i}||_2^2
Total_Loss = CTC_Loss + γ*Center_LossIllustrative Figures
Resources
Source code, releases and issue tracking are hosted at:
GitHub: https://github.com/PaddlePaddle/PaddleOCR
Gitee: https://gitee.com/paddlepaddle/PaddleOCR
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
