Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records
Alibaba’s open‑source Logics-Parsing‑v2 achieves top scores on both LogicsDocBench (82.16) and OmniDocBench‑v1.5 (93.23), outperforms leading closed models, and introduces Parsing‑2.0 capabilities that handle flowcharts, music scores, code blocks, and chemical formulas with structured HTML output.
Model Overview
Logics-Parsing-v2, an open‑source OCR model built on the Qwen3‑VL architecture, achieves the top scores on two major benchmarks: 82.16 on the internal LogicsDocBench (900‑page PDF suite) and 93.23 on the public OmniDocBench‑v1.5. Both scores are the highest reported to date.
Parsing‑2.0 Capabilities
Flowcharts / mind maps → output in Mermaid syntax
Music scores → output in ABC notation
Code blocks / pseudocode → structured extraction
Chemical formulas → output in SMILES format
Benchmark Evaluation
LogicsDocBench consists of three scenario groups:
STEM documents – 218 pages covering physics, mathematics, engineering, etc.
Complex layout – 459 pages with multi‑column text, cross‑page tables, vertical text, and mixed graphics.
Parsing‑2.0 – 223 pages containing chemical formulas, music scores, code blocks, and flowcharts.
On this benchmark Logics‑Parsing‑v2 ranks first with an overall score of 82.16 , far ahead of competing models.
On OmniDocBench‑v1.5 the model scores 93.23 , surpassing closed‑source large models such as Gemini 2.5 Pro, GPT‑5, Qwen2.5VL‑72B, as well as specialized OCR systems like Mathpix.
Comparison with Other Models
Versus Gemini 2.5 Pro – competitive on English text (0.089 vs 0.115) and comparable on tables (0.165 vs 0.154).
Versus Mathpix – Mathpix remains stronger on formula recognition (0.06 vs 0.106), but Logics‑Parsing‑v2 shows superior overall capability.
Versus MonkeyOCR / GOT‑OCR – Logics‑Parsing‑v2 leads across all evaluated dimensions.
Versus general large models (GPT‑5, Qwen2.5VL‑72B) – dedicated OCR model demonstrates clear advantages.
The model operates end‑to‑end: an image is fed in and structured HTML is produced, eliminating the need for a multi‑stage detection‑recognition‑post‑processing pipeline.
Output Format
Results are returned as structured HTML rather than plain text. Each content block includes:
Category tags (paragraph, table, image, formula, etc.)
Pixel‑level bounding‑box coordinates
Recognized OCR text
For the new Parsing‑2.0 scenarios the output is customized:
Flowcharts → Mermaid syntax (directly renderable)
Music scores → ABC notation (readable by musicians)
Chemical formulas → SMILES format (standard chemical representation)
Deployment and Inference
conda create -n logics-parsing-v2 python=3.10
conda activate logics-parsing-v2
pip install -r requirements.txtDownload the model (choose one source):
# HuggingFace
pip install huggingface_hub
python download_model_v2.py -t huggingface
# ModelScope (faster in China)
pip install modelscope
python download_model_v2.py -t modelscopeRun inference with a single command:
python3 inference_v2.py --image_path <image_path> --output_path <output_dir> --model_path <model_path>Demo Results
Distorted document recognition – accurate even with skewed or curved pages.
STEM document – complex formulas and charts are preserved.
Code block recognition – retains code structure.
Flowchart parsing – converts diagrams to Mermaid code ready for rendering.
Music score recognition – first OCR model to output ABC notation.
Key Takeaways
Dual‑benchmark leader: LogicsDocBench 82.16, OmniDocBench‑v1.5 93.23.
Parsing‑2.0 adds full‑cycle support for flowcharts, music scores, code blocks, and chemical formulas.
End‑to‑end single‑model pipeline: image → structured HTML without additional post‑processing.
All code and model weights are open‑source (GitHub: https://github.com/alibaba/Logics-Parsing, HuggingFace: https://huggingface.co/Logics-MLLM/Logics-Parsing-v2, ModelScope demo: https://www.modelscope.cn/studios/Alibaba-DT/Logics-Parsing/summary).
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
