Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding
IBM’s newly released open‑source model Granite‑Docling‑258M tackles the long‑standing challenge of converting diverse digital documents into machine‑readable, structured data by preserving layout, tables, formulas, and supporting multiple languages, while remaining lightweight at 258 M parameters and outperforming its predecessor SmolDocling‑256M‑Preview.
Converting documents with varied formats, complex layouts, tables, images, and formulas into accurate, machine‑readable structured data has been a core technical challenge; traditional OCR pipelines often rely on multiple tightly coupled modules, making optimization and generalization difficult.
IBM recently open‑sourced a lightweight multimodal document‑processing model called Granite‑Docling‑258M. The model is designed for efficient end‑to‑end document conversion, outputting a machine‑readable format while fully preserving layout, tables, equations, and other visual elements.
Granite‑Docling‑258M uses the DocTags format to describe document structure precisely and integrates seamlessly with the Docling library, enabling accurate capture of each element’s content, spatial position, and hierarchy. At only 258 M parameters, it surpasses the earlier SmolDocling‑256M‑Preview in chart recognition, full‑page OCR, and code extraction benchmarks.
The model also supports multiple languages, including Arabic, Chinese, and Japanese, providing a compact yet high‑performance solution for multilingual OCR scenarios.
Deployment can be performed with a one‑click tutorial on the HyperAI platform. The steps are:
Navigate to the HyperAI homepage, open the “Tutorials” section, and select “Granite‑Docling‑258M: Lightweight Multimodal Document Processing Model”, then click “Run this tutorial online”.
After the page loads, click the “Clone” button in the top‑right corner to copy the tutorial repository into your own container.
Choose the “NVIDIA GeForce RTX 4090” hardware option and the “PyTorch” image, then proceed. The platform offers various billing modes; new users can obtain free RTX 4090 and CPU time via the invitation link.
Wait roughly three minutes for resource allocation. Once the status shows “Running”, click the arrow next to the API address to open the demo page (authentication required).
In the demo, upload an image via the “Upload Image” button and enter a query in the “Ask new question” box to see the model extract and answer based on the document content.
Overall, Granite‑Docling‑258M provides a compact, high‑performance OCR solution that unifies structural and content understanding, with easy deployment and multilingual capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
HyperAI Super Neural
Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
