Tagged articles
228 articles
Page 1 of 3
Su San Talks Tech
Su San Talks Tech
May 20, 2026 · Artificial Intelligence

Why Convert Docs to Markdown for LLMs? Meet the Open‑Source MarkItDown Tool

The article explains that LLMs process Markdown more effectively than raw PDFs, introduces Microsoft’s open‑source MarkItDown utility that converts a wide range of file types—including PDFs, Word, Excel, HTML, images with OCR, and YouTube videos—into clean Markdown, and provides installation, usage examples, recent feature updates, and a brief critique of its scope.

Azure Document IntelligenceCLILLM preprocessing
0 likes · 6 min read
Why Convert Docs to Markdown for LLMs? Meet the Open‑Source MarkItDown Tool
DataFunTalk
DataFunTalk
May 15, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article provides a comprehensive technical overview of multimodal GraphRAG, detailing document‑intelligence parsing pipelines, layout analysis, OCR‑pipeline vs OCR‑free approaches, knowledge‑graph integration for chunk relationships, multimodal indexing, retrieval‑generation workflows, and a comparative analysis of RAG, GraphRAG, and KG‑QA solutions.

Document IntelligenceGraphRAGKnowledge Graph
0 likes · 23 min read
Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models
DataFunTalk
DataFunTalk
May 10, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, multimodal graph index construction, knowledge‑graph‑driven chunk linking, recent research progress, performance trade‑offs, and practical recommendations for deploying RAG solutions.

Document IntelligenceGraphRAGKnowledge Graph
0 likes · 23 min read
Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models
AI Engineer Programming
AI Engineer Programming
May 9, 2026 · Artificial Intelligence

Why PDF Parsing Is Hard for RAG and Which Mainstream Solutions Work

The article examines the intrinsic challenges of extracting structured text from PDFs for Retrieval‑Augmented Generation—such as missing reading order, table reconstruction, font encoding, and scanned images—and compares lightweight libraries, AI‑enhanced frameworks, commercial APIs, and visual language models as practical solutions.

AI frameworksOCRPDF parsing
0 likes · 23 min read
Why PDF Parsing Is Hard for RAG and Which Mainstream Solutions Work
SuanNi
SuanNi
Apr 30, 2026 · Artificial Intelligence

Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud

This guide explains how to use Baidu's open‑source PaddleOCR engine—its full OCR and layout analysis pipeline, multi‑language support, and output formats—to set up a continuously running document recognition service on the 算网 GPU cloud platform, including environment preparation, model configuration, and inference execution.

Document ProcessingGPUMagicMind
0 likes · 6 min read
Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud
AI Architecture Path
AI Architecture Path
Apr 29, 2026 · Artificial Intelligence

Fed up feeding AI with docs? Microsoft’s Open‑Source MarkItDown converts any format to Markdown in a few lines

MarkItDown, an open‑source Python tool from Microsoft’s AutoGen team, converts over 20 document and media formats—including Word, Excel, PDF, images, audio and YouTube links—into standardized Markdown, offering OCR, LLM integration, Docker deployment, Azure Document Intelligence support, and extensive command‑line examples for enterprise and research pipelines.

AutoGenAzure Document IntelligenceDocker
0 likes · 13 min read
Fed up feeding AI with docs? Microsoft’s Open‑Source MarkItDown converts any format to Markdown in a few lines
Java Architect Essentials
Java Architect Essentials
Apr 17, 2026 · Backend Development

How to Integrate Tess4J OCR into a Spring Boot Application

This article explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4J, guides you through downloading language data, shows step‑by‑step Spring Boot integration with Maven dependencies and configuration classes, and provides test code for Chinese, English, and mixed‑language image recognition.

JavaLanguage DataOCR
0 likes · 9 min read
How to Integrate Tess4J OCR into a Spring Boot Application
ShiZhen AI
ShiZhen AI
Apr 12, 2026 · Artificial Intelligence

Convert Any File to Clean Markdown in One Click with Microsoft’s MarkItDown

MarkItDown, an open‑source tool from Microsoft’s AutoGen team, lets you feed PDFs, Office documents, web data, media, and even YouTube videos into large language models by converting them to clean Markdown in a single command, preserving structure for better AI understanding.

Azure Document IntelligenceLLM preprocessingMarkItDown
0 likes · 6 min read
Convert Any File to Clean Markdown in One Click with Microsoft’s MarkItDown
Java Architect Handbook
Java Architect Handbook
Apr 1, 2026 · Backend Development

Integrating Tess4j OCR into a Spring Boot 3 Project

This guide explains OCR fundamentals, introduces Tesseract and Tess4j, shows how to download the required language data files, and provides step‑by‑step instructions with Maven configuration, Spring Boot properties, Java code, and test examples for Chinese, English, and mixed‑language image recognition.

JavaOCRSpring Boot
0 likes · 11 min read
Integrating Tess4j OCR into a Spring Boot 3 Project
AI Explorer
AI Explorer
Mar 28, 2026 · Artificial Intelligence

How Chandra OCR 2 Accurately Parses Complex Tables and Handwritten Text

Chandra OCR 2, an open‑source model on GitHub, combines full‑layout understanding with multi‑format output to precisely digitize complex tables, handwritten notes, formulas and multilingual documents, outperforming other OCR solutions in benchmark tests and offering easy installation for developers.

Chandra OCR 2Document IntelligenceLayout Understanding
0 likes · 6 min read
How Chandra OCR 2 Accurately Parses Complex Tables and Handwritten Text
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 27, 2026 · Artificial Intelligence

Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Alibaba’s open‑source Logics-Parsing‑v2 achieves top scores on both LogicsDocBench (82.16) and OmniDocBench‑v1.5 (93.23), outperforms leading closed models, and introduces Parsing‑2.0 capabilities that handle flowcharts, music scores, code blocks, and chemical formulas with structured HTML output.

ABC notationBenchmarkLogics-Parsing-v2
0 likes · 9 min read
Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records
Architecture Digest
Architecture Digest
Mar 26, 2026 · Artificial Intelligence

How to Integrate Tess4j OCR into a Spring Boot 3 Application

This guide explains the fundamentals of OCR, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data files, configure a Spring Boot 3 project with Maven dependencies and YAML settings, and provides comprehensive test code for Chinese, English, and mixed‑language image recognition.

JavaOCRSpring Boot
0 likes · 9 min read
How to Integrate Tess4j OCR into a Spring Boot 3 Application
Data STUDIO
Data STUDIO
Mar 26, 2026 · Operations

10 Open‑Source Python Tools That Replace Paid SaaS Apps

The article presents ten Python libraries—pikepdf, Playwright, pdf2image + pytesseract, moviepy, pydub + ffmpeg, reportlab, yt‑dlp, watchdog, pyvirtualcam, and rich + textual—each with code samples, runtime requirements, complexity analysis, practical tips, and common pitfalls, showing how they can substitute costly commercial software while offering greater control, privacy, and customization.

Audio ProcessingAutomationFile Monitoring
0 likes · 19 min read
10 Open‑Source Python Tools That Replace Paid SaaS Apps
SpringMeng
SpringMeng
Mar 25, 2026 · Backend Development

How to Perform OCR in SpringBoot Using Tess4j

This tutorial explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data, integrate Tess4j into a SpringBoot 3 project with Maven configuration, and provides test code for Chinese, English, and mixed‑language image recognition while highlighting performance considerations.

ConfigurationJavaOCR
0 likes · 9 min read
How to Perform OCR in SpringBoot Using Tess4j
java1234
java1234
Mar 24, 2026 · Backend Development

How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J

This tutorial explains OCR fundamentals, introduces the open‑source Tesseract engine and its Java wrapper Tess4J, shows how to download the required traineddata files, and provides step‑by‑step Spring Boot 3 integration, configuration, and test code for Chinese, English, and mixed‑language image recognition, plus important usage notes.

JavaOCRSpring Boot
0 likes · 8 min read
How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J
Java Companion
Java Companion
Mar 22, 2026 · Backend Development

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

This tutorial walks through the fundamentals of OCR, explains how to download the required Tesseract traineddata files, shows how to add Tess4j as a Maven dependency, configure SpringBoot with custom properties, and provides complete Java test code for Chinese, English, and mixed‑language image recognition, highlighting performance considerations and file‑naming requirements.

BackendJavaOCR
0 likes · 9 min read
How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 22, 2026 · Artificial Intelligence

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

This article examines MinerU’s strengths and nine critical shortcomings—such as reading order errors, split tables, merged cells, OCR misrecognition, formula handling, heading hierarchy loss, output inconsistency, hardware limits, and licensing issues—and provides concrete improvement strategies and interview‑ready talking points for engineers.

Document ParsingInterview TipsMinerU
0 likes · 12 min read
How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 20, 2026 · Artificial Intelligence

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

This article examines MinerU's strengths and nine critical shortcomings—such as layout order errors, cross‑page table splits, merged‑cell failures, OCR misrecognition, and licensing issues—and provides concrete improvement strategies, interview‑ready resume bullets, and practical response frameworks for engineers.

LLMLayout AnalysisMinerU
0 likes · 13 min read
Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 10, 2026 · Artificial Intelligence

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

FireRed‑OCR‑2B, an open‑source 2‑billion‑parameter visual‑language model, addresses structural hallucination in document OCR through a geometry‑aware data factory and a three‑stage training pipeline, achieving a 92.94 OmniDocBench v1.5 score and leading end‑to‑end performance while remaining lightweight enough for consumer‑grade GPUs.

FireRed-OCROCROmniDocBench
0 likes · 11 min read
FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination
Huolala Tech
Huolala Tech
Mar 4, 2026 · Artificial Intelligence

How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification

Lalamove tackled the scalability and accuracy challenges of worldwide driver onboarding by designing a layered edge‑cloud AI architecture that combines lightweight mobile models, cloud‑based large‑language and computer‑vision models, OCR, and multimodal LLMs to filter low‑quality inputs, automate identity checks, and reduce manual effort while maintaining data compliance.

AIDriver VerificationOCR
0 likes · 12 min read
How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification
SpringMeng
SpringMeng
Mar 2, 2026 · Backend Development

Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition

This article presents a complete design and implementation of a high‑throughput, asynchronous OCR pipeline built with Spring Boot and Tesseract, covering distributed architecture, thread‑pool tuning, image‑preprocessing, multi‑engine recognition, data extraction strategies, Kubernetes deployment, security compliance, chaos testing, and future AI‑driven enhancements.

AsynchronousGPUJava
0 likes · 10 min read
Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 26, 2026 · Artificial Intelligence

Edit Banana Turns AI‑Generated Pixel Diagrams into Fully Editable PPT and Drawio Files

Edit Banana addresses the common pain of uneditable AI‑generated pixel diagrams by instantly converting them into fully editable Drawio (XML) or PPTX files, preserving text, shapes, and connections, and offering LaTeX extraction and a human‑in‑the‑loop mode for complex icons.

AIGCEdit BananaMultimodal AI
0 likes · 6 min read
Edit Banana Turns AI‑Generated Pixel Diagrams into Fully Editable PPT and Drawio Files
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 8, 2026 · Artificial Intelligence

Choosing the Best OCR Large Model: DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR Compared

This article provides a detailed technical comparison of four OCR large models—DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR—covering their architectures, parameter sizes, release dates, licensing, core features, strengths, weaknesses, benchmark scores, multilingual support, deployment requirements, and recommended use‑cases, helping readers select the most suitable model for their needs.

BenchmarkDeepSeek-OCR 2GLM-OCR
0 likes · 17 min read
Choosing the Best OCR Large Model: DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR Compared
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 3, 2026 · Artificial Intelligence

Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench

GLM-OCR, a 0.9B‑parameter multimodal OCR model from Zhipu, achieves the highest score (94.62) on OmniDocBench V1.5, offers lightweight deployment via vLLM, Ollama, API and SDK, and outperforms larger rivals like DeepSeek‑OCR and PaddleOCR in speed and accuracy.

DeploymentGLM-OCROCR
0 likes · 10 min read
Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 31, 2026 · Artificial Intelligence

How a 0.1B‑Parameter OCR Model Beats Multi‑Billion‑Parameter Vision‑Language Models

UniRec‑0.1B, a lightweight OCR model with only 0.1 B parameters, achieves accuracy comparable to or better than multi‑billion‑parameter visual‑language models across text, formula, and mixed‑content tasks, thanks to hierarchical supervision training, a semantic‑decoupled tokenizer, and a large 40 M‑sample dataset, while delivering 2‑9× faster inference and full open‑source availability.

Hierarchical SupervisionOCRSemantic Decoupled Tokenizer
0 likes · 12 min read
How a 0.1B‑Parameter OCR Model Beats Multi‑Billion‑Parameter Vision‑Language Models
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 28, 2026 · Artificial Intelligence

How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough

The article details a step‑by‑step local deployment of DeepSeek‑OCR‑2, covering GPU memory requirements, accuracy on complex tables, long inference times, dependency hurdles like GCC, GLIBC and flash‑attn, and provides concrete solutions using conda environments and symlinks.

CondaDeepSeek-OCR 2Deployment
0 likes · 7 min read
How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough
PaperAgent
PaperAgent
Jan 27, 2026 · Artificial Intelligence

How DeepSeek-OCR 2’s Dual-Flow Attention Redefines Document Understanding

DeepSeek-OCR 2 introduces a novel dual‑stream (bidirectional + causal) attention architecture that replaces fixed raster scanning, leverages a Qwen2‑0.5B encoder, and achieves state‑of‑the‑art accuracy on OmniDocBench while reducing token budget and improving reading‑order consistency.

DeepEncoderDeepSeekDual-Stream Attention
0 likes · 8 min read
How DeepSeek-OCR 2’s Dual-Flow Attention Redefines Document Understanding
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 27, 2026 · Artificial Intelligence

DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow

DeepSeek-OCR 2 introduces Visual Causal Flow and a LLM‑based visual encoder, achieving 91.09% accuracy on OmniDocBench v1.5, while providing detailed installation, two inference modes (vLLM and Transformers), and an analysis of its strengths and limitations for complex document processing.

DeepEncoder V2DeepSeek-OCR 2LLM
0 likes · 9 min read
DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow
Alibaba Cloud Native
Alibaba Cloud Native
Jan 22, 2026 · Cloud Native

Building a Cloud‑Native AI Glass Traffic Enforcement Prototype with AgentRun and Serverless Functions

This article details a cloud‑native architecture that combines Meta Ray‑Ban AI glasses, a custom iOS app, and Alibaba Cloud Function Compute (FC) with AgentRun to perform OCR‑based traffic rule enforcement, showcasing a three‑layer "client‑brain‑tools" design, prompt‑driven logic, and cost‑effective serverless deployment.

AIAgent ArchitectureAlibaba Cloud
0 likes · 14 min read
Building a Cloud‑Native AI Glass Traffic Enforcement Prototype with AgentRun and Serverless Functions
Wuming AI
Wuming AI
Jan 3, 2026 · Artificial Intelligence

How to Remove Watermarks and Fix Chinese Text in NotebookLM‑Generated PPTs

This guide walks you through a two‑step process—first using SlideDeckCleaner to strip watermarks from NotebookLM‑generated PDF PPTs, then employing an AI‑powered PPT conversion service to resolve Chinese garbled text and improve image clarity, with detailed screenshots and tips for handling stubborn elements.

AI PPT conversionNotebookLMOCR
0 likes · 4 min read
How to Remove Watermarks and Fix Chinese Text in NotebookLM‑Generated PPTs
Wuming AI
Wuming AI
Dec 30, 2025 · Artificial Intelligence

Build an AI Agent that Turns arXiv Screenshot into Direct PDF Download

The article shows how to create a simple AI agent that receives a screenshot of an arXiv paper, automatically extracts the paper’s URL and PDF link using a custom prompt, and then lets users view the abstract, download the PDF, or save it to a knowledge base.

AI AgentKnowledge BaseOCR
0 likes · 4 min read
Build an AI Agent that Turns arXiv Screenshot into Direct PDF Download
Old Meng AI Explorer
Old Meng AI Explorer
Dec 26, 2025 · Artificial Intelligence

How PaddleOCR Boosts Text Extraction Efficiency 10×: A Hands‑On Review

PaddleOCR, Baidu’s open‑source OCR engine, delivers high‑accuracy multilingual text extraction from images, PDFs, and handwritten notes, offering offline operation, free commercial use, and specialized models for invoices, IDs, and tables, enabling users to automate document processing and increase productivity up to tenfold.

AIDocument AutomationOCR
0 likes · 9 min read
How PaddleOCR Boosts Text Extraction Efficiency 10×: A Hands‑On Review
Su San Talks Tech
Su San Talks Tech
Dec 13, 2025 · Information Security

How to Use Apache Tika in Spring Boot for Sensitive Data Detection and DLP

This article explains Apache Tika's core features, architecture, and common use cases, then provides a step‑by‑step Spring Boot tutorial that integrates Tika to extract file content, detect personal identifiers with regex, and return results via a REST API for data‑loss‑prevention.

Apache TikaDLPFile Parsing
0 likes · 24 min read
How to Use Apache Tika in Spring Boot for Sensitive Data Detection and DLP
Sohu Tech Products
Sohu Tech Products
Dec 3, 2025 · Mobile Development

How to Build a Scalable Android Ad‑Monitoring System with Multi‑Device Automation

This article details the design and implementation of an Android ad‑monitoring platform that controls multiple devices concurrently, automates app interactions, uses OCR for ad detection, and provides real‑time status monitoring via a floating window, while covering architecture, core modules, communication strategies, and performance optimizations.

ADBAd MonitoringAndroid
0 likes · 27 min read
How to Build a Scalable Android Ad‑Monitoring System with Multi‑Device Automation
AI Algorithm Path
AI Algorithm Path
Dec 1, 2025 · Artificial Intelligence

Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL

This article introduces vision‑language models, explains why they outperform OCR‑plus‑LLM pipelines, and walks through practical OCR and information‑extraction tasks using Qwen3‑VL, complete with code snippets, example prompts, result analysis, and a discussion of the model's limitations and resource considerations.

Deep LearningInformation ExtractionOCR
0 likes · 13 min read
Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL
HyperAI Super Neural
HyperAI Super Neural
Nov 28, 2025 · Artificial Intelligence

Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3

This weekly roundup highlights five recent AI papers—including HumanSense for multimodal LLM evaluation, JAM‑2 for de novo antibody design, the open‑source Olmo 3 language models, the Lumine generalist 3D agent, and the lightweight HunyuanOCR vision‑language model—summarizing their core contributions, results, and links.

OCRgeneralist agentsmultimodal LLM
0 likes · 6 min read
Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3
HyperAI Super Neural
HyperAI Super Neural
Nov 11, 2025 · Artificial Intelligence

How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts

Deepseek-OCR leverages a visual‑compression approach, combining DeepEncoder and the DeepSeek3B‑MoE‑A570M decoder, to represent document text with far fewer visual tokens, achieving up to 97% OCR accuracy and surpassing GOT‑OCR2.0 and MinerU2.0 on OmniDocBench, while the article offers a one‑click deployment tutorial.

DeepEncoderLLMOCR
0 likes · 6 min read
How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts
Architect's Guide
Architect's Guide
Nov 10, 2025 · Artificial Intelligence

Build a Scalable, High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract

This article details a complete, production‑grade OCR invoice processing pipeline that combines a distributed Spring Boot microservice architecture, deep Tesseract optimizations, ML‑based data validation, GPU acceleration, Kubernetes deployment, and extensive performance and security strategies to achieve million‑scale daily throughput with high accuracy.

OCRPerformance OptimizationSpring Boot
0 likes · 16 min read
Build a Scalable, High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract
DataFunSummit
DataFunSummit
Oct 30, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

This article explores how the explosion of unstructured data exposes the limits of traditional OCR and shows how emerging multimodal large language models provide end‑to‑end document understanding, reduce pipeline complexity, cut training costs, enable hybrid retrieval‑augmented generation, and drive real‑world industry deployments.

AIDocument ProcessingOCR
0 likes · 28 min read
How Multimodal Large Models Are Revolutionizing Document Processing and OCR
Old Meng AI Explorer
Old Meng AI Explorer
Oct 30, 2025 · Artificial Intelligence

How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds

This article explains how PaddleOCR, an open‑source OCR engine from Baidu, achieves high‑accuracy text extraction from handwritten notes, scanned PDFs, invoices, IDs and multilingual documents, offering offline cross‑platform support, free commercial use, and step‑by‑step guidance for rapid deployment.

AutomationDocument ProcessingOCR
0 likes · 10 min read
How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds
HyperAI Super Neural
HyperAI Super Neural
Oct 27, 2025 · Artificial Intelligence

Weekly AI Paper Digest: New OCR Model, Multimodal LLM, Next‑Gen DNA Sequencing

This week’s AI roundup highlights five recent papers: DeepSeek‑OCR’s context‑compression model for large‑scale data generation, Rex‑Omni’s 3‑billion‑parameter multimodal LLM achieving state‑of‑the‑art object perception, Alpha‑Service’s proactive AI‑glass framework, a bias‑variance approach to narrowing cross‑lingual gaps, and GATK’s MapReduce‑based toolkit for next‑generation DNA sequencing.

AI GlassesCross-lingual NLPDNA Sequencing
0 likes · 6 min read
Weekly AI Paper Digest: New OCR Model, Multimodal LLM, Next‑Gen DNA Sequencing
Fun with Large Models
Fun with Large Models
Oct 26, 2025 · Artificial Intelligence

From Deep Learning to Large‑Model OCR: Which Model Leads the Pack?

This article traces OCR's evolution from early CNN‑LSTM systems to modern multimodal VLMs, analyzes leading open‑source models such as DeepSeek‑OCR, PaddleOCR, and MonkeyOCR, and offers practical guidance for long‑document, academic, and edge‑computing scenarios.

MonkeyOCRMultimodal AIOCR
0 likes · 15 min read
From Deep Learning to Large‑Model OCR: Which Model Leads the Pack?
DataFunTalk
DataFunTalk
Oct 20, 2025 · Artificial Intelligence

How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens

DeepSeek-OCR, a newly open‑sourced 3B‑parameter OCR model, uses a novel DeepEncoder and a 3B MoE decoder to compress long‑text contexts into visual tokens, achieving up to 10× compression with 97% accuracy and demonstrating strong practical performance on benchmarks and multilingual documents.

DeepSeekMultimodal AIOCR
0 likes · 11 min read
How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens
HyperAI Super Neural
HyperAI Super Neural
Sep 26, 2025 · Artificial Intelligence

Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding

IBM’s newly released open‑source model Granite‑Docling‑258M tackles the long‑standing challenge of converting diverse digital documents into machine‑readable, structured data by preserving layout, tables, formulas, and supporting multiple languages, while remaining lightweight at 258 M parameters and outperforming its predecessor SmolDocling‑256M‑Preview.

DoclingDocument AIIBM
0 likes · 5 min read
Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding
AndroidPub
AndroidPub
Sep 26, 2025 · Mobile Development

How to Add On‑Device AI Scanning to Your Android App with ML Kit

This article walks through the practical steps of integrating Google ML Kit into an Android app, covering its privacy‑first, zero‑learning‑curve advantages and providing complete code examples for barcode scanning, OCR, error handling, CameraX setup, and performance tuning.

AndroidBarcode ScanningCameraX
0 likes · 14 min read
How to Add On‑Device AI Scanning to Your Android App with ML Kit
Code Ape Tech Column
Code Ape Tech Column
Sep 23, 2025 · Backend Development

Integrate Tess4J OCR into Spring Boot: Step‑by‑Step Guide

This tutorial walks you through setting up a Spring Boot project with Tess4J, adding required dependencies, configuring language data, implementing an OCR service and REST controller, and testing both local file and remote URL image recognition, all with complete code examples.

Image ProcessingJavaOCR
0 likes · 6 min read
Integrate Tess4J OCR into Spring Boot: Step‑by‑Step Guide
Sohu Tech Products
Sohu Tech Products
Sep 17, 2025 · Artificial Intelligence

Choosing the Right Python OCR Library: pytesseract, cnocr, or PaddleOCR?

This article compares three popular Python OCR frameworks—pytesseract, cnocr, and PaddleOCR—examining their installation ease, Chinese recognition ability, model size, accuracy, and unique features, and provides practical code examples to help developers pick the best tool for their needs.

Image ProcessingOCRPaddleOCR
0 likes · 5 min read
Choosing the Right Python OCR Library: pytesseract, cnocr, or PaddleOCR?
DaTaobao Tech
DaTaobao Tech
Sep 17, 2025 · Artificial Intelligence

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

This article details how a multimodal AI model was integrated to detect and improve ID card photo quality, covering common image issues, differences between OCR and multimodal extraction, deployment strategies, performance metrics, cost estimation, and the resulting business and technical benefits.

ID verificationModel DeploymentMultimodal AI
0 likes · 13 min read
Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide
Tencent Technical Engineering
Tencent Technical Engineering
Sep 12, 2025 · Artificial Intelligence

How POINTS-Reader Achieves State‑of‑the‑Art PDF Extraction Without Teacher Models

The POINTS-Reader paper, accepted at EMNLP 2025, introduces a two‑stage, fully automated data generation pipeline that enables a lightweight visual‑language model to extract text, tables, and LaTeX formulas from diverse PDF layouts with superior performance and high throughput, all without relying on costly teacher‑model distillation.

AIDocument ParsingOCR
0 likes · 12 min read
How POINTS-Reader Achieves State‑of‑the‑Art PDF Extraction Without Teacher Models
Chen Tian Universe
Chen Tian Universe
Sep 8, 2025 · Operations

Unlocking the Power of Financial Shared Service Centers: A Complete Guide

This article explains the background, concept, suitable enterprises, involved departments, policies, processes, technical architecture, and common challenges of Financial Shared Service Centers (FSSC), offering a step‑by‑step roadmap for organizations seeking cost reduction, efficiency, and stronger financial control.

Financial Shared ServicesOCRRPA
0 likes · 17 min read
Unlocking the Power of Financial Shared Service Centers: A Complete Guide
Architect
Architect
Aug 21, 2025 · Artificial Intelligence

Implement OCR in Java with Tess4j and SpringBoot in Just a Few Lines

This tutorial walks you through adding optical character recognition to a Java SpringBoot project using the Tess4j library, covering prerequisites, dependency setup, engine initialization, RESTful API creation, and tips for improving accuracy with custom training data or third‑party services.

Image ProcessingJavaOCR
0 likes · 8 min read
Implement OCR in Java with Tess4j and SpringBoot in Just a Few Lines
Architect
Architect
Aug 16, 2025 · Artificial Intelligence

Build a Scalable High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract

This article presents a comprehensive, high‑throughput OCR invoice processing solution that combines distributed system design, Spring Boot asynchronous execution, Tesseract deep optimization, multi‑engine fusion, structured data extraction, performance tuning, Kubernetes deployment, and security compliance.

AIKubernetesOCR
0 likes · 16 min read
Build a Scalable High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jul 31, 2025 · Artificial Intelligence

How dots.ocr Achieves SOTA Multilingual Document Parsing with a 1.7B VLM

dots.ocr is a 1.7 billion-parameter multilingual document-parsing model that unifies layout detection and content recognition within a single visual-language model, delivering state-of-the-art performance across text, tables, formulas and reading order while remaining efficient and extensible for future multimodal AI research.

AIBenchmarkDocument Parsing
0 likes · 10 min read
How dots.ocr Achieves SOTA Multilingual Document Parsing with a 1.7B VLM
Java Tech Enthusiast
Java Tech Enthusiast
Jul 13, 2025 · Artificial Intelligence

Build a Java SpringBoot 3.x License Plate Recognition System with OCR

This article walks through creating a server‑side license‑plate recognition solution using Java SpringBoot 3.x, Tesseract OCR, and OpenCV, covering project goals, Maven dependencies, image‑processing services, special‑plate handling, and a REST API for real‑time plate detection.

JavaOCROpenCV
0 likes · 8 min read
Build a Java SpringBoot 3.x License Plate Recognition System with OCR
Baidu Geek Talk
Baidu Geek Talk
Jul 9, 2025 · Artificial Intelligence

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.

AIComputer VisionMCP
0 likes · 11 min read
PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration
Selected Java Interview Questions
Selected Java Interview Questions
Jun 3, 2025 · Artificial Intelligence

Implementing OCR in Java with SpringBoot and Tess4j

This article demonstrates how to build a lightweight OCR service in Java using SpringBoot and the Tess4j library, covering dependency setup, Tesseract engine initialization, RESTful API creation, training data options, and deployment considerations.

Image ProcessingOCRRESTful API
0 likes · 7 min read
Implementing OCR in Java with SpringBoot and Tess4j
Python Programming Learning Circle
Python Programming Learning Circle
May 6, 2025 · Artificial Intelligence

Automatic Math Equation Grading with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback

This tutorial explains how to build a Python-based automatic grading system for handwritten math equations by generating synthetic character images, training a convolutional neural network, segmenting input images using projection techniques, evaluating expressions with eval, and overlaying correctness indicators on the original image.

CNNImage ProcessingMath Grading
0 likes · 28 min read
Automatic Math Equation Grading with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback
Liangxu Linux
Liangxu Linux
Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

Computer VisionGitHubOCR
0 likes · 9 min read
Top 10 Open-Source OCR Projects on GitHub Ranked by Stars
Python Programming Learning Circle
Python Programming Learning Circle
Apr 15, 2025 · Artificial Intelligence

Automatic Math Expression Grading with Python, CNN and Image Processing

This tutorial explains how to generate synthetic digit fonts, build a convolutional neural network to recognize handwritten arithmetic expressions, segment images using projection methods, evaluate the results with Python's eval function, and overlay feedback symbols on the original image, providing a complete end‑to‑end solution.

AutomationCNNImageProcessing
0 likes · 27 min read
Automatic Math Expression Grading with Python, CNN and Image Processing
58UXD
58UXD
Mar 14, 2025 · Product Management

How 58租房 Accelerated Landlord Publishing with LBS, OCR, and AI Guidance

This case study details how 58租房 tackled cumbersome landlord publishing by redesigning the workflow with smart location (LBS), AI‑driven shooting assistance, OCR‑based document recognition, and digital‑human guidance, achieving up to 90% faster operations, higher accuracy, and stronger privacy protection.

AI guidanceDigital HumanLBS
0 likes · 7 min read
How 58租房 Accelerated Landlord Publishing with LBS, OCR, and AI Guidance
AI Frontier Lectures
AI Frontier Lectures
Mar 7, 2025 · Artificial Intelligence

Can Mistral’s New OCR Model Really Beat the Competition? A Deep Dive

Mistral AI’s newly launched OCR API claims to deliver world‑class document understanding with multilingual support, high speed, and self‑hosting options, and benchmark tests show it outperforms Azure OCR and Google Doc AI, yet independent evaluations reveal limitations on complex tables and legal forms, prompting a balanced assessment of its readiness for enterprise use.

AI modelBenchmarkMistral AI
0 likes · 7 min read
Can Mistral’s New OCR Model Really Beat the Competition? A Deep Dive
Sohu Tech Products
Sohu Tech Products
Jan 8, 2025 · Artificial Intelligence

Multimodal RAG: Implementation Paths and Development Prospects

The talk outlines Multimodal RAG implementation routes, comparing OCR‑based object recognition, transformer encoder‑decoder encoding, and Visual Language Model processing, explains the ColPali late‑interaction method for multi‑dimensional vector matching, addresses scaling tensors with binarization and reranking, and recommends a hybrid long‑term strategy where VLM excels on abstract imagery while traditional OCR remains valuable.

ColPaliDocument ProcessingMultimodal RAG
0 likes · 10 min read
Multimodal RAG: Implementation Paths and Development Prospects
Programmer DD
Programmer DD
Dec 31, 2024 · Artificial Intelligence

Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB

This article demonstrates how to create an AI‑driven personal expense‑tracking assistant by leveraging Zhipu's GLM‑4V‑Flash multimodal model for receipt OCR, generating SQL statements, and integrating them with MaxKB workflows and a MySQL database, complete with code snippets and deployment steps.

AIGLM-4V-FlashMaxKB
0 likes · 13 min read
Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB
Architecture Breakthrough
Architecture Breakthrough
Dec 26, 2024 · Industry Insights

Understanding Chinese Invoices: Types, Lifecycle, and FinTech Applications

This article provides a comprehensive overview of Chinese invoices, covering legal definitions, paper and electronic forms, basic copies, content fields, lifecycle stages, classification of VAT and ordinary invoices, the distinction between full‑electronic and digital invoices, and their practical use in fintech solutions such as OCR and third‑party verification platforms.

ChinaOCRVAT
0 likes · 18 min read
Understanding Chinese Invoices: Types, Lifecycle, and FinTech Applications
Test Development Learning Exchange
Test Development Learning Exchange
Dec 6, 2024 · Artificial Intelligence

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

This guide explains how to install Tesseract OCR and the Python libraries pytesseract and Pillow, configure the engine path, perform image-to-text extraction with example code, and apply various preprocessing, detection, and post‑processing methods to significantly improve OCR accuracy.

Computer VisionOCRPython
0 likes · 8 min read
Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques
Huolala Tech
Huolala Tech
Nov 28, 2024 · Artificial Intelligence

How AI-Powered OCR Transforms Freight Document and Vehicle Verification

This article explains how AI-driven OCR combined with deep‑learning image classification streamlines ticket, document, and license‑plate verification in freight logistics, detailing system architecture, algorithmic components, and future prospects for unified large‑model OCR solutions.

Image ClassificationOCRartificial intelligence
0 likes · 12 min read
How AI-Powered OCR Transforms Freight Document and Vehicle Verification
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Nov 25, 2024 · Artificial Intelligence

Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code

This guide shows how to set up the open‑source Ollama‑OCR tool, which leverages the Llama 3.2‑Vision multimodal model to perform high‑quality OCR, covering installation of Ollama, the vision model, the OCR package, and example code for plain‑text and Markdown outputs.

Llama 3.2-VisionNode.jsOCR
0 likes · 6 min read
Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code
Bilibili Tech
Bilibili Tech
Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIComputer VisionGame Recognition
0 likes · 14 min read
AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili
Architect
Architect
Nov 2, 2024 · Frontend Development

How to Build Robust Dark Watermarks and Boost OCR Accuracy in Web Apps

This article walks through the evolution of watermark techniques, demonstrates how to harden a front‑end watermark against deletion, invisibility, and covering using MutationObserver and canvas, introduces a low‑visibility dark watermark with decode logic, and details OCR integration and optimization to improve recognition accuracy in screenshot‑search scenarios.

CanvasImage ProcessingMutationObserver
0 likes · 21 min read
How to Build Robust Dark Watermarks and Boost OCR Accuracy in Web Apps
DeWu Technology
DeWu Technology
Sep 11, 2024 · Frontend Development

Advanced Watermark Techniques and OCR Integration for Front-End Applications

The article details progressive front‑end watermark schemes—from a basic canvas overlay to mutation‑observer‑protected, hide‑ and cover‑resistant, and low‑opacity dark watermarks—and explains how adaptive tone handling, contrast tuning, region cropping, and a hybrid OCR pipeline (internal service with tesseract.js fallback) ensure robust, invisible data protection and accurate screenshot analysis.

CanvasFront-endImage Processing
0 likes · 20 min read
Advanced Watermark Techniques and OCR Integration for Front-End Applications
Java Architect Essentials
Java Architect Essentials
Sep 6, 2024 · Artificial Intelligence

Integrating Tess4J OCR into a Spring Boot Application

This guide explains how to set up a Spring Boot project, add the Tess4J dependency, configure language data, implement an OCR service and REST controller, and test both local file uploads and remote image URLs for text recognition.

Image ProcessingJavaOCR
0 likes · 6 min read
Integrating Tess4J OCR into a Spring Boot Application
Python Programming Learning Circle
Python Programming Learning Circle
Sep 4, 2024 · Artificial Intelligence

Building an Automatic Math Grading System with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback

This tutorial explains how to create an automatic math‑grading tool in Python by generating synthetic digit images, training a small CNN on the data, segmenting handwritten equations with projection techniques, recognizing characters, evaluating the expressions, and overlaying the results back onto the original image.

AutomationCNNImage Processing
0 likes · 30 min read
Building an Automatic Math Grading System with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Aug 8, 2024 · Artificial Intelligence

MegaParse: A Precision Document Parser Built for LLMs

MegaParse is an open‑source document parser that transforms PDFs, Word, PPT, Excel and CSV files into LLM‑friendly formats, preserving full information, boosting processing efficiency, and enabling deeper semantic analysis, with quick‑start installation steps and a roadmap for future features.

AI toolsDocument ParsingLLM
0 likes · 4 min read
MegaParse: A Precision Document Parser Built for LLMs
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Jul 17, 2024 · Artificial Intelligence

Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection

PDF‑Extract‑Kit is an open‑source toolkit that combines high‑accuracy layout detection, formula detection, formula recognition, and OCR for PDFs, and the article details its model comparisons, evaluation on academic and textbook datasets, and step‑by‑step instructions for running it on Windows or macOS, including Apple Silicon.

Computer VisionOCRPDF-Extract-Kit
0 likes · 6 min read
Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection
Meituan Technology Team
Meituan Technology Team
Jun 13, 2024 · Artificial Intelligence

Overview of Meituan's Selected CVPR 2024 Papers and Online Sharing Event

Meituan's tech team highlights seven CVPR 2024 papers—spanning OCR pre‑training, long‑tail semi‑supervised learning, visual AIGC, audio‑visual segmentation and synthetic‑data detection—provides detailed abstracts and experimental results, and announces an online author‑talk session on June 27.

Audio-Visual SegmentationCVPR 2024Computer Vision
0 likes · 18 min read
Overview of Meituan's Selected CVPR 2024 Papers and Online Sharing Event
Python Programming Learning Circle
Python Programming Learning Circle
Apr 18, 2024 · Artificial Intelligence

Implementing an Automatic Math Expression Grading System with Python and Convolutional Neural Networks

This tutorial walks through building a self‑trained OCR pipeline that generates synthetic digit images, trains a CNN model, segments handwritten math expressions, predicts each character, evaluates the arithmetic result, and overlays checkmarks, crosses or answers onto the original image.

AutomationCNNImage Processing
0 likes · 28 min read
Implementing an Automatic Math Expression Grading System with Python and Convolutional Neural Networks
The Dominant Programmer
The Dominant Programmer
Mar 30, 2024 · Backend Development

Implement OCR in Spring Boot with Tess4J for Image Text Recognition

This guide shows how to integrate the open‑source Tesseract OCR engine into a Spring Boot application using the Tess4J Java wrapper, covering Chinese language data setup, Maven dependency configuration, bean creation, service implementation, and a unit test to verify image text extraction.

OCRSpring Bootimage recognition
0 likes · 6 min read
Implement OCR in Spring Boot with Tess4J for Image Text Recognition
Top Architect
Top Architect
Mar 13, 2024 · Backend Development

Integrating Tess4J OCR into a Spring Boot Backend Service

This tutorial walks through setting up a Spring Boot backend, adding the Tess4J OCR library, creating a service and REST controller to recognize text from both local files and remote image URLs, and provides testing steps and deployment tips.

JavaOCRREST API
0 likes · 8 min read
Integrating Tess4J OCR into a Spring Boot Backend Service
Top Architect
Top Architect
Mar 6, 2024 · Backend Development

Integrating Tess4J OCR into a Spring Boot Backend Service

This guide demonstrates how to integrate Tess4J OCR into a Spring Boot application, covering environment setup, Maven dependencies, adding language data, creating an OCR service class, building REST endpoints for local and remote image processing, and testing the solution.

JavaOCRSpring Boot
0 likes · 8 min read
Integrating Tess4J OCR into a Spring Boot Backend Service
Code Ape Tech Column
Code Ape Tech Column
Feb 2, 2024 · Artificial Intelligence

Integrating Tess4J OCR into a Spring Boot Application

This guide walks through setting up a Spring Boot project, adding Tess4J dependencies, configuring language data, implementing an OCR service class, exposing REST endpoints for local and remote image recognition, and testing the OCR functionality end‑to‑end.

JavaOCRREST API
0 likes · 6 min read
Integrating Tess4J OCR into a Spring Boot Application