Artificial Intelligence 8 min read

Why No Perfect VLM OCR Exists for Complex Financial Reports – An In‑Depth Model Comparison

The article evaluates several VLM‑based OCR models on complex financial statements, comparing speed, layout accuracy, and handling of irregular tables, and concludes that while some models excel in specific aspects, none yet deliver a flawless solution for all scenarios.

AI2ML AI to Machine Learning

Nov 5, 2025

Why No Perfect VLM OCR Exists for Complex Financial Reports – An In‑Depth Model Comparison

Paper Data

The paper "olmOCR 2: Unit Test Rewards for Document OCR" reports test results for many OCR models, highlighting DeepSeek-OCR, PaddleOCR-VL, Infinity-Parser, and MinerU-VLM as strong performers.

Overall Intuition

For complex‑layout financial reports, the author selected representative examples and observed:

Pipeline‑based models such as MinerU‑Pipeline and PaddleOCR-VL are relatively slow, and sometimes their layout recognition is inferior to end‑to‑end VLM models.

Special‑trained models like DeepSeek-OCR and MinerU-VLM are fast and provide decent layout accuracy.

Models fine‑tuned from large VLMs (Infinity-Parser, Chandra OCR, olmOCR) are large and slow, but generally achieve good detail and layout results.

Thus, if speed is the priority, DeepSeek-OCR or MinerU-VLM are recommended; for a balance of speed and table accuracy, PaddleOCR-VL is a good choice; with ample compute, the Qwen‑VL fine‑tuned series (Infinity‑Parser, Chandra OCR, olmOCR) can be tried.

Case Studies

Case 1 – Blocked Short Paragraphs

PaddleOCR’s layout appears chaotic.

DeepSeek‑OCR shows minor issues with short‑paragraph recognition.

MinerU also struggles with short paragraphs.

Infinity‑Parser performs reasonably well.

Case 2 – Mixed Text and Short Paragraphs

PaddleOCR‑VL layout is messy.

DeepSeek‑OCR shows layout problems with short paragraphs.

MinerU‑VLM is acceptable.

Infinity‑Parser remains solid.

Case 3 – Flowchart with Mixed Font Sizes

PaddleOCR layout is confusing.

DeepSeek‑OCR works reasonably.

MinerU‑VLM loses some small characters.

Infinity‑Parser remains acceptable.

Case 4 – Semi‑Open Table with Mixed Layout

PaddleOCR‑VL still has layout issues but can label tiny table text well.

DeepSeek‑OCR short paragraphs have minor problems.

MinerU‑VLM short paragraphs also show issues.

Infinity‑Parser’s short‑paragraph handling is slightly better, and it recognises small table text well.

Case 5 – Cross‑Row Semi‑Open Table

PaddleOCR‑VL recognises tables well but layout is weak.

DeepSeek‑OCR table recognition degrades significantly.

MinerU‑VLM table results are average.

Infinity‑Parser provides the best semantic column separation.

Case 6 – Frame‑Less Distant Table

PaddleOCR‑VL fails to recognise the table but keeps the overall order.

DeepSeek‑OCR also fails to recognise the table yet preserves order.

MinerU‑VLM does not recognise the table and the order becomes chaotic.

Infinity‑Parser surprisingly succeeds.

Case 7 – Frame‑Less Multi‑Column Complex Table

PaddleOCR handles the table well despite layout issues.

DeepSeek‑OCR makes small table errors.

MinerU‑VLM performs acceptably.

Infinity‑Parser suffers major table errors.

Case 8 – Cross‑Column Table

PaddleOCR handles cross‑column tables correctly.

DeepSeek‑OCR has minor cross‑column errors, misrecognising the last row.

MinerU‑VLM shows no cross‑column issues.

Infinity‑Parser collapses completely on cross‑column tables.

Summary

Overall, VLM‑based OCR for complex financial reports still lacks a perfect solution. Infinity‑Parser demonstrates strong semantic understanding but frequently makes critical OCR mistakes. PaddleOCR‑VL struggles with complex layouts yet excels at table, small‑font, and formula recognition. MinerU‑VLM is fast and performs well in many scenarios, though its semantic grasp lags slightly. DeepSeek‑OCR runs extremely fast with decent layout handling, but fine‑grained details are sometimes missing.

The author plans to release the test code; the models run well on an RTX 4060, with PaddleOCR‑VL and MinerU‑VLM requiring modest attention to configuration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

financial reports deepseek-ocr paddleocr-vl Infinity-Parser MinerU-VLM VLM OCR

Written by

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.