Artificial Intelligence 10 min read

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Baidu Geek Talk

Sep 8, 2021

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

Overview

PP‑OCRv2 is an upgraded version of the PP‑OCR algorithm in the PaddleOCR open‑source OCR framework. It introduces five major technical improvements that increase accuracy, inference speed and keep model size compact (≈11.6 MB).

Key Technical Improvements

Collaborative Mutual Learning (CML) knowledge distillation for detection. A teacher network (ResNet‑18) guides two student networks (MobileNetV3) while the students also learn from each other via mutual learning. Three loss terms are used: GT loss, DML loss and distillation loss.

Copy‑Paste data augmentation for detection. Two training images are randomly selected, scaled, optionally flipped, and a subset of text objects from one image is pasted onto random locations of the other image, increasing the diversity of positive samples.

LCNet lightweight backbone for recognition. Based on MobileNetV1, LCNet replaces ReLU with h‑swish (except in SE modules), expands the depth‑wise convolution kernel to 5×5 in the fifth stage, adds SE modules to the last two SEP blocks, and inserts a 1280‑dim fully‑connected layer after global average pooling. These changes yield 1‑3 % accuracy gains with minimal parameter increase.

Enhanced UDML (Unified Distillation with Mutual Learning) for recognition. In addition to standard DML, a feature‑level loss is introduced and an extra fully‑connected head is added, accelerating distillation and improving final accuracy.

Enhanced CTC loss with Center loss for Chinese character recognition. Center loss, borrowed from metric learning, enlarges inter‑class distances, reducing confusion among visually similar Chinese glyphs.

Performance Impact

Relative to the original PP‑OCR mobile model, overall OCR accuracy improves by >7 %.

Inference speed is increased by more than 220 % compared with the PP‑OCR server model.

The total model size remains low at 11.6 MB, enabling deployment on both server and mobile platforms.

Implementation Details

CML Knowledge Distillation

# Teacher backbone
ResNet18

# Student backbones
MobileNetV3 (two instances)

# Losses
GT_Loss = CrossEntropy(y_true, y_pred)
DML_Loss = KLDiv(student1 || student2) + KLDiv(student2 || student1)
Distill_Loss = KLDiv(teacher || student_i)
Total_Loss = GT_Loss + λ1*DML_Loss + λ2*Distill_Loss

Copy‑Paste Augmentation Pipeline

1. Randomly pick image A and image B.
2. Apply random scaling and horizontal flip to both.
3. Sample a subset of text objects from image A.
4. Paste the sampled objects onto random positions in image B.
5. Use the augmented pair for training the detection head.

LCNet Backbone Modifications

- Replace ReLU → h‑swish (except inside SE blocks)
- Depth‑wise conv kernel in stage‑5: 5×5
- Add SE modules to the last two SEP blocks
- After GAP, add FC(1280) before the classification head

UDML with Feature Loss

Feature_Loss = L2(Feature_teacher, Feature_student)
Total_Recognition_Loss = CTC_Loss + α*Feature_Loss + β*FC_Head_Loss

Enhanced CTC + Center Loss

CTC_Loss = -log p(sequence|input)
Center_Loss = Σ_i ||f_i - c_{y_i}||_2^2
Total_Loss = CTC_Loss + γ*Center_Loss

Illustrative Figures

Resources

Source code, releases and issue tracking are hosted at:

GitHub: https://github.com/PaddlePaddle/PaddleOCR

Gitee: https://gitee.com/paddlepaddle/PaddleOCR

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision Data Augmentation Model Optimization OCR knowledge distillation PaddleOCR PP-OCRv2

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.