Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 10, 2026 · Artificial Intelligence

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

FireRed‑OCR‑2B, an open‑source 2‑billion‑parameter visual‑language model, addresses structural hallucination in document OCR through a geometry‑aware data factory and a three‑stage training pipeline, achieving a 92.94 OmniDocBench v1.5 score and leading end‑to‑end performance while remaining lightweight enough for consumer‑grade GPUs.

FireRed-OCROCROmniDocBench
0 likes · 11 min read
FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination