Tagged articles

OmniDocBench

7 articles · Page 1 of 1

Jun 5, 2026 · Artificial Intelligence

How PaddleOCR‑VL‑1.6’s 0.9B Model Achieved 96.33% SOTA on OmniDocBench v1.6

PaddleOCR‑VL‑1.6, a compact 0.9B visual‑language model, diagnoses three types of weak regions, enriches targeted data, and applies a three‑stage CPT‑SFT‑RL training pipeline to reach a 96.33% overall score on OmniDocBench v1.6, surpassing much larger models across all document‑parsing tasks.

OmniDocBenchPaddleOCR-VL-1.6SOTA

0 likes · 10 min read

How PaddleOCR‑VL‑1.6’s 0.9B Model Achieved 96.33% SOTA on OmniDocBench v1.6

Machine Learning Algorithms & Natural Language Processing

Mar 30, 2026 · Artificial Intelligence

Meituan’s Fully Discrete Multimodal Base (LongCat-Next) Shows All Physical Signals Can Converge to Tokens

LongCat-Next, a 3‑billion‑parameter multimodal model released by Meituan, adopts a pure discrete token‑based architecture (DiNA) and next‑token prediction, outperforming same‑size rivals on OmniDocBench‑EN, CharXivRQ, and matching QwenVL on visual tasks, while avoiding catastrophic forgetting and achieving a SWE‑Bench score of 43.0, as demonstrated through extensive benchmarks, receipt extraction, OCR, audio dialect reasoning, and image generation experiments.

DiNALongCat-NextOmniDocBench

0 likes · 10 min read

Meituan’s Fully Discrete Multimodal Base (LongCat-Next) Shows All Physical Signals Can Converge to Tokens

Old Zhang's AI Learning

Mar 10, 2026 · Artificial Intelligence

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

FireRed‑OCR‑2B, an open‑source 2‑billion‑parameter visual‑language model, addresses structural hallucination in document OCR through a geometry‑aware data factory and a three‑stage training pipeline, achieving a 92.94 OmniDocBench v1.5 score and leading end‑to‑end performance while remaining lightweight enough for consumer‑grade GPUs.

FireRed-OCROCROmniDocBench

0 likes · 11 min read

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

Old Zhang's AI Learning

Feb 3, 2026 · Artificial Intelligence

Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench

GLM-OCR, a 0.9B‑parameter multimodal OCR model from Zhipu, achieves the highest score (94.62) on OmniDocBench V1.5, offers lightweight deployment via vLLM, Ollama, API and SDK, and outperforms larger rivals like DeepSeek‑OCR and PaddleOCR in speed and accuracy.

GLM-OCROCROllama

0 likes · 10 min read

Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench

Old Zhang's AI Learning

Jan 30, 2026 · Artificial Intelligence

PaddleOCR‑VL‑1.5: 0.9B Model Beats Billion‑Parameter OCR Models with 94.5% Accuracy

PaddleOCR‑VL‑1.5, the latest Baidu release, uses only 0.9 B parameters to achieve 94.5% accuracy on OmniDocBench v1.5, surpassing larger open‑source and commercial OCR models, while offering multi‑task, multi‑language support, lightweight deployment, and detailed performance benchmarks.

DeepSeek-OCRGPU inferenceOCR

0 likes · 9 min read

PaddleOCR‑VL‑1.5: 0.9B Model Beats Billion‑Parameter OCR Models with 94.5% Accuracy

Old Zhang's AI Learning

Jan 27, 2026 · Artificial Intelligence

DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow

DeepSeek-OCR 2 introduces Visual Causal Flow and a LLM‑based visual encoder, achieving 91.09% accuracy on OmniDocBench v1.5, while providing detailed installation, two inference modes (vLLM and Transformers), and an analysis of its strengths and limitations for complex document processing.

DeepEncoder V2DeepSeek-OCR 2LLM

0 likes · 9 min read

DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow

HyperAI Super Neural

Nov 11, 2025 · Artificial Intelligence

How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts

Deepseek-OCR leverages a visual‑compression approach, combining DeepEncoder and the DeepSeek3B‑MoE‑A570M decoder, to represent document text with far fewer visual tokens, achieving up to 97% OCR accuracy and surpassing GOT‑OCR2.0 and MinerU2.0 on OmniDocBench, while the article offers a one‑click deployment tutorial.

DeepEncoderDeepSeek-OCRLLM

0 likes · 6 min read

How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts