PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 was released shortly after the launch of version 3.0 and brings three major upgrades:

Multilingual PP‑OCRv5 model : Supports 37 languages (including French, Spanish, Portuguese, Russian, Korean, etc.) with an average recognition accuracy increase of more than 30%. The model leverages the Wenxin 4.5 multimodal capabilities to automatically generate high‑quality training data, addressing data scarcity and annotation cost.

PP‑DocTranslation pipeline : Built on PP‑StructureV3 and Wenxin 4.5, it can translate Markdown, PDF, and image documents, allowing users to provide custom terminology tables for fine‑grained multilingual translation.

MCP server support : Users can quickly set up an MCP server to expose PaddleOCR core capabilities (text detection, OCR, document parsing) via local Python libraries, cloud services, or self‑hosted deployments, enabling seamless integration with downstream AI applications.

Key Technical Steps

Automatic line detection and cropping : PP‑OCRv5 detection model locates and crops each text line for standardized input.

High‑confidence text recognition : Wenxin 4.5 performs multiple independent recognitions per line, selecting consistent results to improve annotation accuracy and reduce human bias.

CLI Usage Examples

# Use the --lang parameter to run OCR for French
paddleocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \
    --lang fr \
    --use_doc_orientation False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --save_path ./output \
    --device gpu:0

For document translation, the --target_language flag specifies the output language:

paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key

MCP Server Capabilities

Text recognition : Detects and recognizes text in images and PDFs, returning JSON with coordinates and content.

Document parsing : Extracts blocks, titles, paragraphs, images, tables, and outputs structured Markdown and JSON.

The server supports three deployment modes: local Python library, Baidu Star River community service, and self‑hosted service, with both stdio and Streamable HTTP transport mechanisms.

Demo Scenarios

Demo 1 : In Claude for Desktop, extract handwritten content from images and sync it to Notion using the MCP server.

Demo 2 : Convert handwritten sketches or pseudo‑code in VSCode into style‑compliant Python scripts and push them to a GitHub repository.

Demo 3 : Transform PDFs or images containing complex tables, formulas, and handwritten text into editable Word or Excel files.

Conclusion

Since the release of PaddleOCR 3.0, extensive feedback on multilingual recognition and MCP support has driven the development of PaddleOCR 3.1. Developers, researchers, and industry users are encouraged to try the new version, provide feedback, and contribute to the open‑source repository.

Open‑source repository: https://github.com/PaddlePaddle/PaddleOCR

computer visionAIMCPOCRmultilingualPaddleOCRdocument translation
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.