PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration
PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.
PaddleOCR 3.1 was released shortly after the launch of version 3.0 and brings three major upgrades:
Multilingual PP‑OCRv5 model : Supports 37 languages (including French, Spanish, Portuguese, Russian, Korean, etc.) with an average recognition accuracy increase of more than 30%. The model leverages the Wenxin 4.5 multimodal capabilities to automatically generate high‑quality training data, addressing data scarcity and annotation cost.
PP‑DocTranslation pipeline : Built on PP‑StructureV3 and Wenxin 4.5, it can translate Markdown, PDF, and image documents, allowing users to provide custom terminology tables for fine‑grained multilingual translation.
MCP server support : Users can quickly set up an MCP server to expose PaddleOCR core capabilities (text detection, OCR, document parsing) via local Python libraries, cloud services, or self‑hosted deployments, enabling seamless integration with downstream AI applications.
Key Technical Steps
Automatic line detection and cropping : PP‑OCRv5 detection model locates and crops each text line for standardized input.
High‑confidence text recognition : Wenxin 4.5 performs multiple independent recognitions per line, selecting consistent results to improve annotation accuracy and reduce human bias.
CLI Usage Examples
# Use the --lang parameter to run OCR for French
paddleocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \
--lang fr \
--use_doc_orientation False \
--use_doc_unwarping False \
--use_textline_orientation False \
--save_path ./output \
--device gpu:0For document translation, the --target_language flag specifies the output language:
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_keyMCP Server Capabilities
Text recognition : Detects and recognizes text in images and PDFs, returning JSON with coordinates and content.
Document parsing : Extracts blocks, titles, paragraphs, images, tables, and outputs structured Markdown and JSON.
The server supports three deployment modes: local Python library, Baidu Star River community service, and self‑hosted service, with both stdio and Streamable HTTP transport mechanisms.
Demo Scenarios
Demo 1 : In Claude for Desktop, extract handwritten content from images and sync it to Notion using the MCP server.
Demo 2 : Convert handwritten sketches or pseudo‑code in VSCode into style‑compliant Python scripts and push them to a GitHub repository.
Demo 3 : Transform PDFs or images containing complex tables, formulas, and handwritten text into editable Word or Excel files.
Conclusion
Since the release of PaddleOCR 3.0, extensive feedback on multilingual recognition and MCP support has driven the development of PaddleOCR 3.1. Developers, researchers, and industry users are encouraged to try the new version, provide feedback, and contribute to the open‑source repository.
Open‑source repository: https://github.com/PaddlePaddle/PaddleOCR
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
