Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Alibaba’s open‑source Logics-Parsing‑v2 achieves top scores on both LogicsDocBench (82.16) and OmniDocBench‑v1.5 (93.23), outperforms leading closed models, and introduces Parsing‑2.0 capabilities that handle flowcharts, music scores, code blocks, and chemical formulas with structured HTML output.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Model Overview

Logics-Parsing-v2, an open‑source OCR model built on the Qwen3‑VL architecture, achieves the top scores on two major benchmarks: 82.16 on the internal LogicsDocBench (900‑page PDF suite) and 93.23 on the public OmniDocBench‑v1.5. Both scores are the highest reported to date.

Parsing‑2.0 Capabilities

Flowcharts / mind maps → output in Mermaid syntax

Music scores → output in ABC notation

Code blocks / pseudocode → structured extraction

Chemical formulas → output in SMILES format

Benchmark Evaluation

LogicsDocBench consists of three scenario groups:

STEM documents – 218 pages covering physics, mathematics, engineering, etc.

Complex layout – 459 pages with multi‑column text, cross‑page tables, vertical text, and mixed graphics.

Parsing‑2.0 – 223 pages containing chemical formulas, music scores, code blocks, and flowcharts.

On this benchmark Logics‑Parsing‑v2 ranks first with an overall score of 82.16 , far ahead of competing models.

On OmniDocBench‑v1.5 the model scores 93.23 , surpassing closed‑source large models such as Gemini 2.5 Pro, GPT‑5, Qwen2.5VL‑72B, as well as specialized OCR systems like Mathpix.

Comparison with Other Models

Versus Gemini 2.5 Pro – competitive on English text (0.089 vs 0.115) and comparable on tables (0.165 vs 0.154).

Versus Mathpix – Mathpix remains stronger on formula recognition (0.06 vs 0.106), but Logics‑Parsing‑v2 shows superior overall capability.

Versus MonkeyOCR / GOT‑OCR – Logics‑Parsing‑v2 leads across all evaluated dimensions.

Versus general large models (GPT‑5, Qwen2.5VL‑72B) – dedicated OCR model demonstrates clear advantages.

The model operates end‑to‑end: an image is fed in and structured HTML is produced, eliminating the need for a multi‑stage detection‑recognition‑post‑processing pipeline.

Output Format

Results are returned as structured HTML rather than plain text. Each content block includes:

Category tags (paragraph, table, image, formula, etc.)

Pixel‑level bounding‑box coordinates

Recognized OCR text

For the new Parsing‑2.0 scenarios the output is customized:

Flowcharts → Mermaid syntax (directly renderable)

Music scores → ABC notation (readable by musicians)

Chemical formulas → SMILES format (standard chemical representation)

Deployment and Inference

conda create -n logics-parsing-v2 python=3.10
conda activate logics-parsing-v2
pip install -r requirements.txt

Download the model (choose one source):

# HuggingFace
pip install huggingface_hub
python download_model_v2.py -t huggingface

# ModelScope (faster in China)
pip install modelscope
python download_model_v2.py -t modelscope

Run inference with a single command:

python3 inference_v2.py --image_path <image_path> --output_path <output_dir> --model_path <model_path>

Demo Results

Distorted document recognition – accurate even with skewed or curved pages.

Distorted document recognition
Distorted document recognition

STEM document – complex formulas and charts are preserved.

STEM document parsing
STEM document parsing

Code block recognition – retains code structure.

Code block parsing
Code block parsing

Flowchart parsing – converts diagrams to Mermaid code ready for rendering.

Flowchart parsing
Flowchart parsing

Music score recognition – first OCR model to output ABC notation.

Music score recognition
Music score recognition

Key Takeaways

Dual‑benchmark leader: LogicsDocBench 82.16, OmniDocBench‑v1.5 93.23.

Parsing‑2.0 adds full‑cycle support for flowcharts, music scores, code blocks, and chemical formulas.

End‑to‑end single‑model pipeline: image → structured HTML without additional post‑processing.

All code and model weights are open‑source (GitHub: https://github.com/alibaba/Logics-Parsing, HuggingFace: https://huggingface.co/Logics-MLLM/Logics-Parsing-v2, ModelScope demo: https://www.modelscope.cn/studios/Alibaba-DT/Logics-Parsing/summary).

OCRopen-sourcebenchmarkMermaidABC notationLogics-Parsing-v2Parsing-2.0
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.