Tagged articles

OCR

241 articles · Page 1 of 3

Jun 29, 2026 · Artificial Intelligence

How an AI-Powered Question Recording System Supercharges Efficiency for Middle School Teachers

This article details the design and implementation of a locally deployed AI system that automatically extracts, structures, and manages exam questions from scanned papers, supporting multiple subjects, reducing manual effort, and enabling flexible test generation for teachers.

AIDeepSeekEducation Technology

0 likes · 15 min read

How an AI-Powered Question Recording System Supercharges Efficiency for Middle School Teachers

Machine Heart

Jun 23, 2026 · Artificial Intelligence

Unlimited OCR Achieves SOTA Long-Document Parsing in a Single Forward Pass

Unlimited OCR, Baidu's open‑source model built on DeepSeek OCR, uses a novel Reference Sliding Window Attention to compress visual tokens and keep KV cache size constant, enabling end‑to‑end parsing of whole books with 93.23% OmniDocBench v1.5 score and stable latency across dozens of pages.

DeepSeekLarge Language ModelLong Document

0 likes · 14 min read

Unlimited OCR Achieves SOTA Long-Document Parsing in a Single Forward Pass

Machine Learning Algorithms & Natural Language Processing

Jun 15, 2026 · Artificial Intelligence

Blurry Images Create a ‘Comfort Zone’ for Jailbreaking Multimodal LLMs

A new study from Westlake University shows that when harmful text is rendered as low‑resolution, blurry, or noisy images, multimodal large language models become significantly easier to jailbreak despite still recognizing the text, revealing a U‑shaped risk curve and a simple mitigation that decouples OCR from safety checks.

OCRjailbreakmultimodal LLM

0 likes · 10 min read

Blurry Images Create a ‘Comfort Zone’ for Jailbreaking Multimodal LLMs

Machine Heart

Jun 14, 2026 · Artificial Intelligence

When Blurry Images Create an Attack Comfort Zone for Multimodal LLMs

Westlake University's AGI Lab shows that when harmful text is rendered as low‑resolution, blurry or noisy images, multimodal large language models can still read the content but their safety filters fail, creating an 'attack comfort zone' that dramatically raises jailbreak success rates across several models.

OCRjailbreakmultimodal LLM

0 likes · 9 min read

When Blurry Images Create an Attack Comfort Zone for Multimodal LLMs

Python Crawling & Data Mining

Jun 10, 2026 · Artificial Intelligence

Automating Validation of 300,000 Records with Python + AI to Detect Errors and Dirty Data

Even with 99 % accuracy, tens of thousands of errors remain in a 300 k‑row dataset, so the author builds a Python‑AI pipeline that preprocesses images, performs high‑precision OCR, merges data, applies custom validation rules, and automatically generates an error report, dramatically reducing manual effort.

AIAutomationData Validation

0 likes · 6 min read

Automating Validation of 300,000 Records with Python + AI to Detect Errors and Dirty Data

Python Crawling & Data Mining

Jun 7, 2026 · Artificial Intelligence

Python OCR Table Extraction: Boost Accuracy from 95% to 99% with Batch Processing

The article explains why generic OCR struggles with structured tables, proposes a partition‑based fixed‑region recognition method using PaddleOCR, provides a complete Python script for batch processing, and demonstrates how this approach consistently achieves over 99% accuracy.

Batch ProcessingOCRPaddleOCR

0 likes · 4 min read

Python OCR Table Extraction: Boost Accuracy from 95% to 99% with Batch Processing

Python Crawling & Data Mining

Jun 5, 2026 · Artificial Intelligence

Why I Dropped WPS for Python: Handling 30 000 Images and 300 000 Records with Batch Pre‑processing

Faced with 30,000 report images containing 300,000 rows of tabular data, the author explains why WPS failed at scale, analyzes the OCR error sources, and shares a Python script that batch‑crops images to boost recognition accuracy before exporting everything to Excel.

Batch ProcessingOCRdata extraction

0 likes · 6 min read

Why I Dropped WPS for Python: Handling 30 000 Images and 300 000 Records with Batch Pre‑processing

Python Crawling & Data Mining

Jun 4, 2026 · Artificial Intelligence

Batch Convert 30,000 Images to Excel with Python: Automating 300,000 Data Entries

The author details how they used AI‑assisted Python scripts to batch‑process over 30,000 report images, extract tables via OCR, crop noisy regions, merge results into a single Excel sheet with 99% accuracy, and automate validation, eliminating manual data entry.

AIAutomationExcel

0 likes · 7 min read

Batch Convert 30,000 Images to Excel with Python: Automating 300,000 Data Entries

Old Zhang's AI Learning

Jun 2, 2026 · Fundamentals

Lightning‑Fast Open‑Source Local PDF Parser: LiteParse Processes 400‑Page PDFs in 1 Second

LiteParse, an open‑source Rust‑based local PDF parser from the LlamaIndex team, extracts text from a 400‑page PDF in about one second, offers multi‑language bindings, flexible OCR, bounding‑box output, and Agent Skill integration, while its limitations include basic table handling and complex layout support.

Agent SkillLiteParseLocal processing

0 likes · 9 min read

Lightning‑Fast Open‑Source Local PDF Parser: LiteParse Processes 400‑Page PDFs in 1 Second

Su San Talks Tech

May 20, 2026 · Artificial Intelligence

Why Convert Docs to Markdown for LLMs? Meet the Open‑Source MarkItDown Tool

The article explains that LLMs process Markdown more effectively than raw PDFs, introduces Microsoft’s open‑source MarkItDown utility that converts a wide range of file types—including PDFs, Word, Excel, HTML, images with OCR, and YouTube videos—into clean Markdown, and provides installation, usage examples, recent feature updates, and a brief critique of its scope.

Azure Document IntelligenceCLILLM preprocessing

0 likes · 6 min read

Why Convert Docs to Markdown for LLMs? Meet the Open‑Source MarkItDown Tool

DataFunTalk

May 15, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article provides a comprehensive technical overview of multimodal GraphRAG, detailing document‑intelligence parsing pipelines, layout analysis, OCR‑pipeline vs OCR‑free approaches, knowledge‑graph integration for chunk relationships, multimodal indexing, retrieval‑generation workflows, and a comparative analysis of RAG, GraphRAG, and KG‑QA solutions.

GraphRAGKnowledge GraphLayout Analysis

0 likes · 23 min read

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

DataFunTalk

May 10, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, multimodal graph index construction, knowledge‑graph‑driven chunk linking, recent research progress, performance trade‑offs, and practical recommendations for deploying RAG solutions.

GraphRAGKnowledge GraphOCR

0 likes · 23 min read

AI Engineer Programming

May 9, 2026 · Artificial Intelligence

Why PDF Parsing Is Hard for RAG and Which Mainstream Solutions Work

The article examines the intrinsic challenges of extracting structured text from PDFs for Retrieval‑Augmented Generation—such as missing reading order, table reconstruction, font encoding, and scanned images—and compares lightweight libraries, AI‑enhanced frameworks, commercial APIs, and visual language models as practical solutions.

AI frameworksOCRPDF parsing

0 likes · 23 min read

Why PDF Parsing Is Hard for RAG and Which Mainstream Solutions Work

SuanNi

Apr 30, 2026 · Artificial Intelligence

Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud

This guide explains how to use Baidu's open‑source PaddleOCR engine—its full OCR and layout analysis pipeline, multi‑language support, and output formats—to set up a continuously running document recognition service on the 算网 GPU cloud platform, including environment preparation, model configuration, and inference execution.

Document processingGPUMagicMind

0 likes · 6 min read

Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud

Kuaishou Tech

Apr 29, 2026 · Operations

Boosting Oncall Interception from 15% to 55%: KOncall’s AI‑Driven Evolution at Kuaishou

Kuaishou’s R&D efficiency team built the KOncall intelligent on‑call platform, integrating LLM‑based retrieval‑augmented generation, Redis Pub/Sub streaming, OCR multimodal parsing, FAQ knowledge ops, and custom reranking, which raised automated query interception from 15% to 55% and processed over 116 000 requests, turning on‑call from a bottleneck into a capability starter.

AI OperationsIncident ManagementKnowledge Management

0 likes · 26 min read

Boosting Oncall Interception from 15% to 55%: KOncall’s AI‑Driven Evolution at Kuaishou

AI Architecture Path

Apr 29, 2026 · Artificial Intelligence

Fed up feeding AI with docs? Microsoft’s Open‑Source MarkItDown converts any format to Markdown in a few lines

MarkItDown, an open‑source Python tool from Microsoft’s AutoGen team, converts over 20 document and media formats—including Word, Excel, PDF, images, audio and YouTube links—into standardized Markdown, offering OCR, LLM integration, Docker deployment, Azure Document Intelligence support, and extensive command‑line examples for enterprise and research pipelines.

AutoGenAzure Document IntelligenceDocker

0 likes · 13 min read

Fed up feeding AI with docs? Microsoft’s Open‑Source MarkItDown converts any format to Markdown in a few lines

Java Architect Essentials

Apr 17, 2026 · Backend Development

How to Integrate Tess4J OCR into a Spring Boot Application

This article explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4J, guides you through downloading language data, shows step‑by‑step Spring Boot integration with Maven dependencies and configuration classes, and provides test code for Chinese, English, and mixed‑language image recognition.

JavaLanguage DataOCR

0 likes · 9 min read

How to Integrate Tess4J OCR into a Spring Boot Application

Old Zhang's AI Learning

Apr 15, 2026 · Artificial Intelligence

A New Era of OCR: Introducing the Powerful xParse Skills for Seamless Document Parsing

This article introduces TextIn's xParse Skills, a zero‑code, high‑accuracy OCR and document‑parsing solution that handles PDFs, images and over 20 other formats with a free daily quota, integrates with LLM agents, and provides detailed installation, command‑line usage, and pros‑cons analysis.

AgentCLIDocument Parsing

0 likes · 10 min read

A New Era of OCR: Introducing the Powerful xParse Skills for Seamless Document Parsing

ShiZhen AI

Apr 12, 2026 · Artificial Intelligence

Convert Any File to Clean Markdown in One Click with Microsoft’s MarkItDown

MarkItDown, an open‑source tool from Microsoft’s AutoGen team, lets you feed PDFs, Office documents, web data, media, and even YouTube videos into large language models by converting them to clean Markdown in a single command, preserving structure for better AI understanding.

Azure Document IntelligenceLLM preprocessingMarkItDown

0 likes · 6 min read

Convert Any File to Clean Markdown in One Click with Microsoft’s MarkItDown

Java Architect Handbook

Apr 1, 2026 · Backend Development

Integrating Tess4j OCR into a Spring Boot 3 Project

This guide explains OCR fundamentals, introduces Tesseract and Tess4j, shows how to download the required language data files, and provides step‑by‑step instructions with Maven configuration, Spring Boot properties, Java code, and test examples for Chinese, English, and mixed‑language image recognition.

JavaOCRSpring Boot

0 likes · 11 min read

Integrating Tess4j OCR into a Spring Boot 3 Project

AI Explorer

Mar 28, 2026 · Artificial Intelligence

How Chandra OCR 2 Accurately Parses Complex Tables and Handwritten Text

Chandra OCR 2, an open‑source model on GitHub, combines full‑layout understanding with multi‑format output to precisely digitize complex tables, handwritten notes, formulas and multilingual documents, outperforming other OCR solutions in benchmark tests and offering easy installation for developers.

Chandra OCR 2Layout UnderstandingOCR

0 likes · 6 min read

How Chandra OCR 2 Accurately Parses Complex Tables and Handwritten Text

Old Zhang's AI Learning

Mar 27, 2026 · Artificial Intelligence

Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Alibaba’s open‑source Logics-Parsing‑v2 achieves top scores on both LogicsDocBench (82.16) and OmniDocBench‑v1.5 (93.23), outperforms leading closed models, and introduces Parsing‑2.0 capabilities that handle flowcharts, music scores, code blocks, and chemical formulas with structured HTML output.

ABC notationLogics-Parsing-v2Mermaid

0 likes · 9 min read

Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Architecture Digest

Mar 26, 2026 · Artificial Intelligence

How to Integrate Tess4j OCR into a Spring Boot 3 Application

This guide explains the fundamentals of OCR, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data files, configure a Spring Boot 3 project with Maven dependencies and YAML settings, and provides comprehensive test code for Chinese, English, and mixed‑language image recognition.

JavaOCRSpring Boot

0 likes · 9 min read

How to Integrate Tess4j OCR into a Spring Boot 3 Application

Data STUDIO

Mar 26, 2026 · Operations

10 Open‑Source Python Tools That Replace Paid SaaS Apps

The article presents ten Python libraries—pikepdf, Playwright, pdf2image + pytesseract, moviepy, pydub + ffmpeg, reportlab, yt‑dlp, watchdog, pyvirtualcam, and rich + textual—each with code samples, runtime requirements, complexity analysis, practical tips, and common pitfalls, showing how they can substitute costly commercial software while offering greater control, privacy, and customization.

Audio ProcessingAutomationFile Monitoring

0 likes · 19 min read

10 Open‑Source Python Tools That Replace Paid SaaS Apps

SpringMeng

Mar 25, 2026 · Backend Development

How to Perform OCR in SpringBoot Using Tess4j

This tutorial explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data, integrate Tess4j into a SpringBoot 3 project with Maven configuration, and provides test code for Chinese, English, and mixed‑language image recognition while highlighting performance considerations.

ConfigurationJavaOCR

0 likes · 9 min read

How to Perform OCR in SpringBoot Using Tess4j

java1234

Mar 24, 2026 · Backend Development

How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J

This tutorial explains OCR fundamentals, introduces the open‑source Tesseract engine and its Java wrapper Tess4J, shows how to download the required traineddata files, and provides step‑by‑step Spring Boot 3 integration, configuration, and test code for Chinese, English, and mixed‑language image recognition, plus important usage notes.

JavaOCRSpring Boot

0 likes · 8 min read

How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J

Java Companion

Mar 22, 2026 · Backend Development

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

This tutorial walks through the fundamentals of OCR, explains how to download the required Tesseract traineddata files, shows how to add Tess4j as a Maven dependency, configure SpringBoot with custom properties, and provides complete Java test code for Chinese, English, and mixed‑language image recognition, highlighting performance considerations and file‑naming requirements.

JavaOCRbackend

0 likes · 9 min read

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

Wu Shixiong's Large Model Academy

Mar 22, 2026 · Artificial Intelligence

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

This article examines MinerU’s strengths and nine critical shortcomings—such as reading order errors, split tables, merged cells, OCR misrecognition, formula handling, heading hierarchy loss, output inconsistency, hardware limits, and licensing issues—and provides concrete improvement strategies and interview‑ready talking points for engineers.

Document ParsingInterview TipsMinerU

0 likes · 12 min read

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

Wu Shixiong's Large Model Academy

Mar 20, 2026 · Artificial Intelligence

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

This article examines MinerU's strengths and nine critical shortcomings—such as layout order errors, cross‑page table splits, merged‑cell failures, OCR misrecognition, and licensing issues—and provides concrete improvement strategies, interview‑ready resume bullets, and practical response frameworks for engineers.

LLMLayout AnalysisMinerU

0 likes · 13 min read

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

Old Zhang's AI Learning

Mar 10, 2026 · Artificial Intelligence

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

FireRed‑OCR‑2B, an open‑source 2‑billion‑parameter visual‑language model, addresses structural hallucination in document OCR through a geometry‑aware data factory and a three‑stage training pipeline, achieving a 92.94 OmniDocBench v1.5 score and leading end‑to‑end performance while remaining lightweight enough for consumer‑grade GPUs.

FireRed-OCROCROmniDocBench

0 likes · 11 min read

FireRed-OCR 2B: An Open‑Source VLM That Tackles Structural Hallucination

Huolala Tech

Mar 4, 2026 · Artificial Intelligence

How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification

Lalamove tackled the scalability and accuracy challenges of worldwide driver onboarding by designing a layered edge‑cloud AI architecture that combines lightweight mobile models, cloud‑based large‑language and computer‑vision models, OCR, and multimodal LLMs to filter low‑quality inputs, automate identity checks, and reduce manual effort while maintaining data compliance.

AIDriver VerificationOCR

0 likes · 12 min read

How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification

SpringMeng

Mar 2, 2026 · Backend Development

Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition

This article presents a complete design and implementation of a high‑throughput, asynchronous OCR pipeline built with Spring Boot and Tesseract, covering distributed architecture, thread‑pool tuning, image‑preprocessing, multi‑engine recognition, data extraction strategies, Kubernetes deployment, security compliance, chaos testing, and future AI‑driven enhancements.

AsynchronousGPUJava

0 likes · 10 min read

Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition

Machine Learning Algorithms & Natural Language Processing

Feb 26, 2026 · Artificial Intelligence

Edit Banana Turns AI‑Generated Pixel Diagrams into Fully Editable PPT and Drawio Files

Edit Banana addresses the common pain of uneditable AI‑generated pixel diagrams by instantly converting them into fully editable Drawio (XML) or PPTX files, preserving text, shapes, and connections, and offering LaTeX extraction and a human‑in‑the‑loop mode for complex icons.

AIGCDrawioEdit Banana

0 likes · 6 min read

Edit Banana Turns AI‑Generated Pixel Diagrams into Fully Editable PPT and Drawio Files

HyperAI Super Neural

Feb 22, 2026 · Artificial Intelligence

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

This article surveys the latest open‑source OCR models—including GLM‑OCR, PaddleOCR‑VL‑1.5, LightOnOCR‑2‑1B, DeepSeek‑OCR 2, and MonkeyOCR—detailing their architectures, benchmark scores on OmniDocBench, hardware requirements, and how to run them via online demos.

Model BenchmarkOCRcomputer vision

0 likes · 8 min read

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

Old Zhang's AI Learning

Feb 8, 2026 · Artificial Intelligence

Choosing the Best OCR Large Model: DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR Compared

This article provides a detailed technical comparison of four OCR large models—DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR—covering their architectures, parameter sizes, release dates, licensing, core features, strengths, weaknesses, benchmark scores, multilingual support, deployment requirements, and recommended use‑cases, helping readers select the most suitable model for their needs.

DeepSeek-OCR 2GLM-OCRHunyuanOCR

0 likes · 17 min read

Choosing the Best OCR Large Model: DeepSeek‑OCR‑2, HunyuanOCR, PaddleOCR‑VL‑1.5, and GLM‑OCR Compared

Old Zhang's AI Learning

Feb 3, 2026 · Artificial Intelligence

Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench

GLM-OCR, a 0.9B‑parameter multimodal OCR model from Zhipu, achieves the highest score (94.62) on OmniDocBench V1.5, offers lightweight deployment via vLLM, Ollama, API and SDK, and outperforms larger rivals like DeepSeek‑OCR and PaddleOCR in speed and accuracy.

GLM-OCROCROllama

0 likes · 10 min read

Why GLM-OCR Leads OCR Benchmarks: 0.9B Model Tops OmniDocBench

Old Zhang's AI Learning

Jan 31, 2026 · Artificial Intelligence

How a 0.1B‑Parameter OCR Model Beats Multi‑Billion‑Parameter Vision‑Language Models

UniRec‑0.1B, a lightweight OCR model with only 0.1 B parameters, achieves accuracy comparable to or better than multi‑billion‑parameter visual‑language models across text, formula, and mixed‑content tasks, thanks to hierarchical supervision training, a semantic‑decoupled tokenizer, and a large 40 M‑sample dataset, while delivering 2‑9× faster inference and full open‑source availability.

Hierarchical SupervisionOCRSemantic Decoupled Tokenizer

0 likes · 12 min read

How a 0.1B‑Parameter OCR Model Beats Multi‑Billion‑Parameter Vision‑Language Models

Old Zhang's AI Learning

Jan 30, 2026 · Artificial Intelligence

PaddleOCR‑VL‑1.5: 0.9B Model Beats Billion‑Parameter OCR Models with 94.5% Accuracy

PaddleOCR‑VL‑1.5, the latest Baidu release, uses only 0.9 B parameters to achieve 94.5% accuracy on OmniDocBench v1.5, surpassing larger open‑source and commercial OCR models, while offering multi‑task, multi‑language support, lightweight deployment, and detailed performance benchmarks.

DeepSeek-OCRGPU inferenceOCR

0 likes · 9 min read

PaddleOCR‑VL‑1.5: 0.9B Model Beats Billion‑Parameter OCR Models with 94.5% Accuracy

HyperAI Super Neural

Jan 30, 2026 · Artificial Intelligence

Frontier OCR Advances: DeepSeek, Tencent, and Baidu Push From Text Recognition to Structured Document Understanding

This weekly AI paper roundup reviews five cutting‑edge OCR studies—DeepSeek‑OCR 2, LightOnOCR‑2‑1B, HunyuanOCR, PaddleOCR‑VL, and GOT—detailing their novel visual‑language architectures, training data, benchmark evaluations, and performance gains over previous models.

DeepSeekGoTLightOnOCR

0 likes · 9 min read

Frontier OCR Advances: DeepSeek, Tencent, and Baidu Push From Text Recognition to Structured Document Understanding

Old Zhang's AI Learning

Jan 28, 2026 · Artificial Intelligence

How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough

The article details a step‑by‑step local deployment of DeepSeek‑OCR‑2, covering GPU memory requirements, accuracy on complex tables, long inference times, dependency hurdles like GCC, GLIBC and flash‑attn, and provides concrete solutions using conda environments and symlinks.

CondaDeepSeek-OCR 2GPU

0 likes · 7 min read

How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough

PaperAgent

Jan 27, 2026 · Artificial Intelligence

How DeepSeek-OCR 2’s Dual-Flow Attention Redefines Document Understanding

DeepSeek-OCR 2 introduces a novel dual‑stream (bidirectional + causal) attention architecture that replaces fixed raster scanning, leverages a Qwen2‑0.5B encoder, and achieves state‑of‑the‑art accuracy on OmniDocBench while reducing token budget and improving reading‑order consistency.

DeepEncoderDeepSeekDual-Stream Attention

0 likes · 8 min read

How DeepSeek-OCR 2’s Dual-Flow Attention Redefines Document Understanding

Old Zhang's AI Learning

Jan 27, 2026 · Artificial Intelligence

DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow

DeepSeek-OCR 2 introduces Visual Causal Flow and a LLM‑based visual encoder, achieving 91.09% accuracy on OmniDocBench v1.5, while providing detailed installation, two inference modes (vLLM and Transformers), and an analysis of its strengths and limitations for complex document processing.

DeepEncoder V2DeepSeek-OCR 2LLM

0 likes · 9 min read

DeepSeek-OCR 2 Enables AI to Read Images with Human‑Like Logical Flow

Alibaba Cloud Native

Jan 22, 2026 · Cloud Native

Building a Cloud‑Native AI Glass Traffic Enforcement Prototype with AgentRun and Serverless Functions

This article details a cloud‑native architecture that combines Meta Ray‑Ban AI glasses, a custom iOS app, and Alibaba Cloud Function Compute (FC) with AgentRun to perform OCR‑based traffic rule enforcement, showcasing a three‑layer "client‑brain‑tools" design, prompt‑driven logic, and cost‑effective serverless deployment.

AIAlibaba CloudCloud Native

0 likes · 14 min read

Building a Cloud‑Native AI Glass Traffic Enforcement Prototype with AgentRun and Serverless Functions

php Courses

Jan 13, 2026 · Artificial Intelligence

Boosting Document Barcode Extraction with PHP and AI: A Step‑by‑Step Guide

This article explains how to combine PHP with AI services to reliably locate, decode, and batch‑process barcodes from scanned documents and PDFs, covering tool setup, code examples, performance tips, and security considerations.

AIBarcode ExtractionBatch Automation

0 likes · 11 min read

Boosting Document Barcode Extraction with PHP and AI: A Step‑by‑Step Guide

Wuming AI

Jan 3, 2026 · Artificial Intelligence

How to Remove Watermarks and Fix Chinese Text in NotebookLM‑Generated PPTs

This guide walks you through a two‑step process—first using SlideDeckCleaner to strip watermarks from NotebookLM‑generated PDF PPTs, then employing an AI‑powered PPT conversion service to resolve Chinese garbled text and improve image clarity, with detailed screenshots and tips for handling stubborn elements.

AI PPT conversionNotebookLMOCR

0 likes · 4 min read

How to Remove Watermarks and Fix Chinese Text in NotebookLM‑Generated PPTs

Wuming AI

Dec 30, 2025 · Artificial Intelligence

Build an AI Agent that Turns arXiv Screenshot into Direct PDF Download

The article shows how to create a simple AI agent that receives a screenshot of an arXiv paper, automatically extracts the paper’s URL and PDF link using a custom prompt, and then lets users view the abstract, download the PDF, or save it to a knowledge base.

AI AgentKnowledge BaseOCR

0 likes · 4 min read

Build an AI Agent that Turns arXiv Screenshot into Direct PDF Download

Old Meng AI Explorer

Dec 26, 2025 · Artificial Intelligence

How PaddleOCR Boosts Text Extraction Efficiency 10×: A Hands‑On Review

PaddleOCR, Baidu’s open‑source OCR engine, delivers high‑accuracy multilingual text extraction from images, PDFs, and handwritten notes, offering offline operation, free commercial use, and specialized models for invoices, IDs, and tables, enabling users to automate document processing and increase productivity up to tenfold.

AIDocument AutomationOCR

0 likes · 9 min read

How PaddleOCR Boosts Text Extraction Efficiency 10×: A Hands‑On Review

Su San Talks Tech

Dec 13, 2025 · Information Security

How to Use Apache Tika in Spring Boot for Sensitive Data Detection and DLP

This article explains Apache Tika's core features, architecture, and common use cases, then provides a step‑by‑step Spring Boot tutorial that integrates Tika to extract file content, detect personal identifiers with regex, and return results via a REST API for data‑loss‑prevention.

Apache TikaDLPFile Parsing

0 likes · 24 min read

How to Use Apache Tika in Spring Boot for Sensitive Data Detection and DLP

Sohu Tech Products

Dec 3, 2025 · Mobile Development

How to Build a Scalable Android Ad‑Monitoring System with Multi‑Device Automation

This article details the design and implementation of an Android ad‑monitoring platform that controls multiple devices concurrently, automates app interactions, uses OCR for ad detection, and provides real‑time status monitoring via a floating window, while covering architecture, core modules, communication strategies, and performance optimizations.

ADBAd MonitoringAndroid

0 likes · 27 min read

How to Build a Scalable Android Ad‑Monitoring System with Multi‑Device Automation

AI Algorithm Path

Dec 1, 2025 · Artificial Intelligence

Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL

This article introduces vision‑language models, explains why they outperform OCR‑plus‑LLM pipelines, and walks through practical OCR and information‑extraction tasks using Qwen3‑VL, complete with code snippets, example prompts, result analysis, and a discussion of the model's limitations and resource considerations.

OCRPythonQwen3-VL

0 likes · 13 min read

Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL

HyperAI Super Neural

Nov 28, 2025 · Artificial Intelligence

Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3

This weekly roundup highlights five recent AI papers—including HumanSense for multimodal LLM evaluation, JAM‑2 for de novo antibody design, the open‑source Olmo 3 language models, the Lumine generalist 3D agent, and the lightweight HunyuanOCR vision‑language model—summarizing their core contributions, results, and links.

OCRProtein designgeneralist agents

0 likes · 6 min read

Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3

SpringMeng

Nov 15, 2025 · Backend Development

Step‑by‑Step SpringBoot + Tess4j Guide to Implement OCR for PDF Images

This article walks through extracting images from PDF files in a SpringBoot application and using Tess4j to perform OCR, comparing popular OCR libraries, showing configuration details, code snippets, and tips for improving accuracy and performance.

JavaOCRspringboot

0 likes · 8 min read

Step‑by‑Step SpringBoot + Tess4j Guide to Implement OCR for PDF Images

HyperAI Super Neural

Nov 11, 2025 · Artificial Intelligence

How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts

Deepseek-OCR leverages a visual‑compression approach, combining DeepEncoder and the DeepSeek3B‑MoE‑A570M decoder, to represent document text with far fewer visual tokens, achieving up to 97% OCR accuracy and surpassing GOT‑OCR2.0 and MinerU2.0 on OmniDocBench, while the article offers a one‑click deployment tutorial.

DeepEncoderDeepSeek-OCRLLM

0 likes · 6 min read

How Deepseek-OCR Achieves SOTA Using Ultra‑Low Visual Token Counts

Architect's Guide

Nov 10, 2025 · Artificial Intelligence

Build a Scalable, High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract

This article details a complete, production‑grade OCR invoice processing pipeline that combines a distributed Spring Boot microservice architecture, deep Tesseract optimizations, ML‑based data validation, GPU acceleration, Kubernetes deployment, and extensive performance and security strategies to achieve million‑scale daily throughput with high accuracy.

OCRPerformance OptimizationSpring Boot

0 likes · 16 min read

Build a Scalable, High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract

Wu Shixiong's Large Model Academy

Nov 2, 2025 · Artificial Intelligence

Why Document Parsing Is the Real Bottleneck in RAG Projects (And How to Fix It)

The article explains that in Retrieval‑Augmented Generation projects the hardest challenge lies in robust document parsing—handling PDFs, PPTs, scanned contracts, OCR errors, and preserving structure—to ensure high‑quality retrieval and avoid hallucinations.

AIOCRRAG

0 likes · 10 min read

Why Document Parsing Is the Real Bottleneck in RAG Projects (And How to Fix It)

DataFunSummit

Oct 30, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

This article explores how the explosion of unstructured data exposes the limits of traditional OCR and shows how emerging multimodal large language models provide end‑to‑end document understanding, reduce pipeline complexity, cut training costs, enable hybrid retrieval‑augmented generation, and drive real‑world industry deployments.

AIDocument processingLarge Language Model

0 likes · 28 min read

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

Code Wrench

Oct 30, 2025 · Artificial Intelligence

Build a Fast Multilingual OCR Service with Go, OpenCV, Tesseract, Vue3 & Docker

This step‑by‑step guide shows how to create a high‑performance OCR service that recognizes Chinese and English, using a Go Gin backend with OpenCV preprocessing and Tesseract, a Vue3 frontend, Docker multi‑stage builds, and Swagger UI for API testing.

DockerGoOCR

0 likes · 10 min read

Build a Fast Multilingual OCR Service with Go, OpenCV, Tesseract, Vue3 & Docker

Old Meng AI Explorer

Oct 30, 2025 · Artificial Intelligence

How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds

This article explains how PaddleOCR, an open‑source OCR engine from Baidu, achieves high‑accuracy text extraction from handwritten notes, scanned PDFs, invoices, IDs and multilingual documents, offering offline cross‑platform support, free commercial use, and step‑by‑step guidance for rapid deployment.

AutomationDocument processingOCR

0 likes · 10 min read

How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds

HyperAI Super Neural

Oct 27, 2025 · Artificial Intelligence

Weekly AI Paper Digest: New OCR Model, Multimodal LLM, Next‑Gen DNA Sequencing

This week’s AI roundup highlights five recent papers: DeepSeek‑OCR’s context‑compression model for large‑scale data generation, Rex‑Omni’s 3‑billion‑parameter multimodal LLM achieving state‑of‑the‑art object perception, Alpha‑Service’s proactive AI‑glass framework, a bias‑variance approach to narrowing cross‑lingual gaps, and GATK’s MapReduce‑based toolkit for next‑generation DNA sequencing.

AI GlassesCross-lingual NLPDNA Sequencing

0 likes · 6 min read

Weekly AI Paper Digest: New OCR Model, Multimodal LLM, Next‑Gen DNA Sequencing

Fun with Large Models

Oct 26, 2025 · Artificial Intelligence

From Deep Learning to Large‑Model OCR: Which Model Leads the Pack?

This article traces OCR's evolution from early CNN‑LSTM systems to modern multimodal VLMs, analyzes leading open‑source models such as DeepSeek‑OCR, PaddleOCR, and MonkeyOCR, and offers practical guidance for long‑document, academic, and edge‑computing scenarios.

DeepSeek-OCRMonkeyOCRMultimodal AI

0 likes · 15 min read

From Deep Learning to Large‑Model OCR: Which Model Leads the Pack?

Baobao Algorithm Notes

Oct 20, 2025 · Artificial Intelligence

Can Visual Tokens Compress Text? Inside DeepSeek-OCR’s Optical Compression

DeepSeek‑OCR introduces a novel visual encoder that transforms text into images, achieving up to 10‑20× token compression while maintaining OCR accuracy, and demonstrates strong performance on OmniDocBench with a 3B‑parameter model across multilingual and multimodal tasks.

AIDeepSeekOCR

0 likes · 10 min read

Can Visual Tokens Compress Text? Inside DeepSeek-OCR’s Optical Compression

DataFunTalk

Oct 20, 2025 · Artificial Intelligence

How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens

DeepSeek-OCR, a newly open‑sourced 3B‑parameter OCR model, uses a novel DeepEncoder and a 3B MoE decoder to compress long‑text contexts into visual tokens, achieving up to 10× compression with 97% accuracy and demonstrating strong practical performance on benchmarks and multilingual documents.

DeepSeekMultimodal AIOCR

0 likes · 11 min read

How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens

HyperAI Super Neural

Oct 14, 2025 · Artificial Intelligence

NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass

OCRBench v2, introduced at NeurIPS 2025, evaluates 58 multimodal models on 23 OCR‑related tasks in Chinese and English, revealing that even top models like Gemini‑2.5‑Pro barely exceed the passing threshold and that most models struggle with fine‑grained text localization and multilingual performance.

EvaluationGeminiNeurIPS 2025

0 likes · 8 min read

NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass

HyperAI Super Neural

Sep 26, 2025 · Artificial Intelligence

Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding

IBM’s newly released open‑source model Granite‑Docling‑258M tackles the long‑standing challenge of converting diverse digital documents into machine‑readable, structured data by preserving layout, tables, formulas, and supporting multiple languages, while remaining lightweight at 258 M parameters and outperforming its predecessor SmolDocling‑256M‑Preview.

DoclingIBMOCR

0 likes · 5 min read

Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding

AndroidPub

Sep 26, 2025 · Mobile Development

How to Add On‑Device AI Scanning to Your Android App with ML Kit

This article walks through the practical steps of integrating Google ML Kit into an Android app, covering its privacy‑first, zero‑learning‑curve advantages and providing complete code examples for barcode scanning, OCR, error handling, CameraX setup, and performance tuning.

AndroidBarcode ScanningCameraX

0 likes · 14 min read

How to Add On‑Device AI Scanning to Your Android App with ML Kit

Code Ape Tech Column

Sep 23, 2025 · Backend Development

Integrate Tess4J OCR into Spring Boot: Step‑by‑Step Guide

This tutorial walks you through setting up a Spring Boot project with Tess4J, adding required dependencies, configuring language data, implementing an OCR service and REST controller, and testing both local file and remote URL image recognition, all with complete code examples.

Image processingJavaOCR

0 likes · 6 min read

Integrate Tess4J OCR into Spring Boot: Step‑by‑Step Guide

Python Programming Learning Circle

Sep 20, 2025 · Mobile Development

How to Automate Onmyoji Gameplay on Android with Python, ADB, and OCR

This tutorial explains how to set up a Python 3.8 environment, use an Android emulator, and employ ADB commands together with OpenCV image matching and Tencent OCR to automatically complete daily tasks in the mobile game Onmyoji, including boss battles, barrier breakthroughs, and resource collection.

ADBAndroidOCR

0 likes · 9 min read

How to Automate Onmyoji Gameplay on Android with Python, ADB, and OCR

Sohu Tech Products

Sep 17, 2025 · Artificial Intelligence

Choosing the Right Python OCR Library: pytesseract, cnocr, or PaddleOCR?

This article compares three popular Python OCR frameworks—pytesseract, cnocr, and PaddleOCR—examining their installation ease, Chinese recognition ability, model size, accuracy, and unique features, and provides practical code examples to help developers pick the best tool for their needs.

Image processingOCRPaddleOCR

0 likes · 5 min read

Choosing the Right Python OCR Library: pytesseract, cnocr, or PaddleOCR?

DaTaobao Tech

Sep 17, 2025 · Artificial Intelligence

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

This article details how a multimodal AI model was integrated to detect and improve ID card photo quality, covering common image issues, differences between OCR and multimodal extraction, deployment strategies, performance metrics, cost estimation, and the resulting business and technical benefits.

ID verificationModel DeploymentMultimodal AI

0 likes · 13 min read

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

Tencent Technical Engineering

Sep 12, 2025 · Artificial Intelligence

How POINTS-Reader Achieves State‑of‑the‑Art PDF Extraction Without Teacher Models

The POINTS-Reader paper, accepted at EMNLP 2025, introduces a two‑stage, fully automated data generation pipeline that enables a lightweight visual‑language model to extract text, tables, and LaTeX formulas from diverse PDF layouts with superior performance and high throughput, all without relying on costly teacher‑model distillation.

AIDocument ParsingOCR

0 likes · 12 min read

How POINTS-Reader Achieves State‑of‑the‑Art PDF Extraction Without Teacher Models

Chen Tian Universe

Sep 8, 2025 · Operations

Unlocking the Power of Financial Shared Service Centers: A Complete Guide

This article explains the background, concept, suitable enterprises, involved departments, policies, processes, technical architecture, and common challenges of Financial Shared Service Centers (FSSC), offering a step‑by‑step roadmap for organizations seeking cost reduction, efficiency, and stronger financial control.

Enterprise ArchitectureFinancial Shared ServicesOCR

0 likes · 17 min read

Unlocking the Power of Financial Shared Service Centers: A Complete Guide

Architect

Aug 21, 2025 · Artificial Intelligence

Implement OCR in Java with Tess4j and SpringBoot in Just a Few Lines

This tutorial walks you through adding optical character recognition to a Java SpringBoot project using the Tess4j library, covering prerequisites, dependency setup, engine initialization, RESTful API creation, and tips for improving accuracy with custom training data or third‑party services.

Image processingJavaOCR

0 likes · 8 min read

Implement OCR in Java with Tess4j and SpringBoot in Just a Few Lines

Architect

Aug 16, 2025 · Artificial Intelligence

Build a Scalable High‑Performance OCR Invoice Pipeline with Spring Boot & Tesseract

This article presents a comprehensive, high‑throughput OCR invoice processing solution that combines distributed system design, Spring Boot asynchronous execution, Tesseract deep optimization, multi‑engine fusion, structured data extraction, performance tuning, Kubernetes deployment, and security compliance.

AIOCRSpring Boot

0 likes · 16 min read

Programmer XiaoFu

Aug 12, 2025 · Backend Development

Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition

This article presents a comprehensive, step‑by‑step analysis of a high‑throughput, asynchronous OCR pipeline built with Spring Boot and Tesseract, covering system architecture, thread‑pool tuning, custom invoice‑specific model training, multi‑engine fusion, structured data extraction, performance optimizations, GPU acceleration, Kubernetes deployment, monitoring, security compliance, chaos testing, and future evolution plans.

AsynchronousGPUOCR

0 likes · 12 min read

Xiaohongshu Tech REDtech

Jul 31, 2025 · Artificial Intelligence

How dots.ocr Achieves SOTA Multilingual Document Parsing with a 1.7B VLM

dots.ocr is a 1.7 billion-parameter multilingual document-parsing model that unifies layout detection and content recognition within a single visual-language model, delivering state-of-the-art performance across text, tables, formulas and reading order while remaining efficient and extensible for future multimodal AI research.

AIDocument ParsingOCR

0 likes · 10 min read

How dots.ocr Achieves SOTA Multilingual Document Parsing with a 1.7B VLM

Python Crawling & Data Mining

Jul 22, 2025 · Artificial Intelligence

How to Fix Python Image Processing Issues with OCR and OpenCV – Step-by-Step Guide

This article walks through a Python image‑processing problem, shows a working OCR solution using ddddocr, suggests an OpenCV binarization alternative, and provides complete code snippets and results to help readers resolve similar issues efficiently.

Image processingOCRddddocr

0 likes · 3 min read

How to Fix Python Image Processing Issues with OCR and OpenCV – Step-by-Step Guide

Java Tech Enthusiast

Jul 13, 2025 · Artificial Intelligence

Build a Java SpringBoot 3.x License Plate Recognition System with OCR

This article walks through creating a server‑side license‑plate recognition solution using Java SpringBoot 3.x, Tesseract OCR, and OpenCV, covering project goals, Maven dependencies, image‑processing services, special‑plate handling, and a REST API for real‑time plate detection.

JavaOCRlicense-plate-recognition

0 likes · 8 min read

Build a Java SpringBoot 3.x License Plate Recognition System with OCR

Baidu Geek Talk

Jul 9, 2025 · Artificial Intelligence

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.

AIMCPOCR

0 likes · 11 min read

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

Architect's Alchemy Furnace

Jul 6, 2025 · Fundamentals

Which Open‑Source PDF‑to‑Markdown Tool Is Right for You? A Deep Dive into 6 Solutions

This article compares six leading open‑source PDF‑to‑Markdown converters, detailing their architectures, core features, suitable use cases, and pros and cons to help developers quickly choose the most appropriate tool for their documentation workflows.

MarkdownOCROpen-source

0 likes · 10 min read

Which Open‑Source PDF‑to‑Markdown Tool Is Right for You? A Deep Dive into 6 Solutions

Programmer XiaoFu

Jun 10, 2025 · Backend Development

Integrating Tess4j with SpringBoot: Low‑Cost OCR Image Recognition

This tutorial shows how to add OCR capabilities to a SpringBoot application using the Tess4j library, covering dependency setup, Tesseract engine initialization, RESTful endpoint implementation, training data choices, and practical tips for handling resources and deployment.

JavaOCRrestapi

0 likes · 7 min read

Integrating Tess4j with SpringBoot: Low‑Cost OCR Image Recognition

Java Captain

Jun 7, 2025 · Artificial Intelligence

How to Perform OCR in Java with Spire.OCR: Step‑by‑Step Guide

This tutorial shows how to set up the Spire.OCR library in Java, configure dependencies, and write code that scans images to extract their text, complete with Maven setup, project configuration, and sample output.

Image processingOCRspire-ocr

0 likes · 4 min read

How to Perform OCR in Java with Spire.OCR: Step‑by‑Step Guide

Selected Java Interview Questions

Jun 3, 2025 · Artificial Intelligence

Implementing OCR in Java with SpringBoot and Tess4j

This article demonstrates how to build a lightweight OCR service in Java using SpringBoot and the Tess4j library, covering dependency setup, Tesseract engine initialization, RESTful API creation, training data options, and deployment considerations.

Image processingOCRRESTful API

0 likes · 7 min read

Implementing OCR in Java with SpringBoot and Tess4j

Python Programming Learning Circle

May 6, 2025 · Artificial Intelligence

Automatic Math Equation Grading with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback

This tutorial explains how to build a Python-based automatic grading system for handwritten math equations by generating synthetic character images, training a convolutional neural network, segmenting input images using projection techniques, evaluating expressions with eval, and overlaying correctness indicators on the original image.

CNNImage processingMath Grading

0 likes · 28 min read

Automatic Math Equation Grading with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback

Liangxu Linux

Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

GitHubMultimodalOCR

0 likes · 9 min read

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

Python Programming Learning Circle

Apr 15, 2025 · Artificial Intelligence

Automatic Math Expression Grading with Python, CNN and Image Processing

This tutorial explains how to generate synthetic digit fonts, build a convolutional neural network to recognize handwritten arithmetic expressions, segment images using projection methods, evaluate the results with Python's eval function, and overlay feedback symbols on the original image, providing a complete end‑to‑end solution.

AutomationCNNImageProcessing

0 likes · 27 min read

Automatic Math Expression Grading with Python, CNN and Image Processing

Architect's Guide

Apr 1, 2025 · Artificial Intelligence

Implementing OCR for ID Card and Business License Recognition in Spring Boot with Tesseract and OpenCV

This article explains how to build a Spring Boot service that uses OpenCV for image preprocessing and Tesseract OCR to extract ID numbers and business license information from photos, providing step‑by‑step guidance, required dependencies, and complete Java code examples.

ID Card RecognitionImage processingJava

0 likes · 8 min read

Implementing OCR for ID Card and Business License Recognition in Spring Boot with Tesseract and OpenCV

58UXD

Mar 14, 2025 · Product Management

How 58租房 Accelerated Landlord Publishing with LBS, OCR, and AI Guidance

This case study details how 58租房 tackled cumbersome landlord publishing by redesigning the workflow with smart location (LBS), AI‑driven shooting assistance, OCR‑based document recognition, and digital‑human guidance, achieving up to 90% faster operations, higher accuracy, and stronger privacy protection.

AI guidanceLBSOCR

0 likes · 7 min read

How 58租房 Accelerated Landlord Publishing with LBS, OCR, and AI Guidance

Full-Stack Cultivation Path

Mar 9, 2025 · Artificial Intelligence

Why Computer Use Agents Like Manus Signal a New Era for AI Automation

The article examines the emerging Computer Use paradigm—LLMs that can see and control a computer screen—detailing its technical foundations, three implementation approaches, performance trade‑offs, and why it could become a dominant design pattern for future AI agents.

AI AgentComputer UseMultimodal AI

0 likes · 9 min read

Why Computer Use Agents Like Manus Signal a New Era for AI Automation

AI Frontier Lectures

Mar 7, 2025 · Artificial Intelligence

Can Mistral’s New OCR Model Really Beat the Competition? A Deep Dive

Mistral AI’s newly launched OCR API claims to deliver world‑class document understanding with multilingual support, high speed, and self‑hosting options, and benchmark tests show it outperforms Azure OCR and Google Doc AI, yet independent evaluations reveal limitations on complex tables and legal forms, prompting a balanced assessment of its readiness for enterprise use.

AI modelMistral AIOCR

0 likes · 7 min read

Can Mistral’s New OCR Model Really Beat the Competition? A Deep Dive

Top Architect

Jan 21, 2025 · Artificial Intelligence

Implementing ID Card and Business License Recognition in Spring Boot Using OpenCV and Tesseract OCR

This tutorial demonstrates how to build a Spring Boot application that extracts ID numbers and business license information from images by preprocessing them with OpenCV and recognizing text with Tesseract OCR, covering the full workflow from image handling to regex-based data extraction.

Image processingJavaOCR

0 likes · 10 min read

Implementing ID Card and Business License Recognition in Spring Boot Using OpenCV and Tesseract OCR

Sohu Tech Products

Jan 8, 2025 · Artificial Intelligence

Multimodal RAG: Implementation Paths and Development Prospects

The talk outlines Multimodal RAG implementation routes, comparing OCR‑based object recognition, transformer encoder‑decoder encoding, and Visual Language Model processing, explains the ColPali late‑interaction method for multi‑dimensional vector matching, addresses scaling tensors with binarization and reranking, and recommends a hybrid long‑term strategy where VLM excels on abstract imagery while traditional OCR remains valuable.

ColPaliDocument processingMultimodal RAG

0 likes · 10 min read

Multimodal RAG: Implementation Paths and Development Prospects

Architecture Digest

Jan 8, 2025 · Artificial Intelligence

Implementing ID Card and Business License Recognition in Spring Boot Using Tesseract OCR and OpenCV

This article explains how to build a Spring Boot service that preprocesses images with OpenCV, extracts text using Tesseract OCR, and then parses identification numbers such as ID cards and business licenses via regular expressions, providing complete code examples and dependency details.

Image processingJavaOCR

0 likes · 8 min read

Implementing ID Card and Business License Recognition in Spring Boot Using Tesseract OCR and OpenCV

Programmer DD

Dec 31, 2024 · Artificial Intelligence

Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB

This article demonstrates how to create an AI‑driven personal expense‑tracking assistant by leveraging Zhipu's GLM‑4V‑Flash multimodal model for receipt OCR, generating SQL statements, and integrating them with MaxKB workflows and a MySQL database, complete with code snippets and deployment steps.

AIGLM-4V-FlashMaxKB

0 likes · 13 min read

Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB

Architecture Breakthrough

Dec 26, 2024 · Industry Insights

Understanding Chinese Invoices: Types, Lifecycle, and FinTech Applications

This article provides a comprehensive overview of Chinese invoices, covering legal definitions, paper and electronic forms, basic copies, content fields, lifecycle stages, classification of VAT and ordinary invoices, the distinction between full‑electronic and digital invoices, and their practical use in fintech solutions such as OCR and third‑party verification platforms.

ChinaOCRVAT

0 likes · 18 min read

Understanding Chinese Invoices: Types, Lifecycle, and FinTech Applications

Python Programming Learning Circle

Dec 13, 2024 · Artificial Intelligence

Batch Image Translation Demo Using Youdao OCR API with Python

This article demonstrates how to build a Python desktop application that batch‑processes cosmetic product images, sends them to Youdao's OCR translation service, and displays the translated text, covering API preparation, request parameters, signature generation, and full source code.

Batch ProcessingOCRTkinter

0 likes · 12 min read

Batch Image Translation Demo Using Youdao OCR API with Python

Test Development Learning Exchange

Dec 6, 2024 · Artificial Intelligence

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

This guide explains how to install Tesseract OCR and the Python libraries pytesseract and Pillow, configure the engine path, perform image-to-text extraction with example code, and apply various preprocessing, detection, and post‑processing methods to significantly improve OCR accuracy.

OCRPythoncomputer vision

0 likes · 8 min read

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

Huolala Tech

Nov 28, 2024 · Artificial Intelligence

How AI-Powered OCR Transforms Freight Document and Vehicle Verification

This article explains how AI-driven OCR combined with deep‑learning image classification streamlines ticket, document, and license‑plate verification in freight logistics, detailing system architecture, algorithmic components, and future prospects for unified large‑model OCR solutions.

OCRartificial-intelligenceimage classification

0 likes · 12 min read

How AI-Powered OCR Transforms Freight Document and Vehicle Verification

Full-Stack Cultivation Path

Nov 25, 2024 · Artificial Intelligence

Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code

This guide shows how to set up the open‑source Ollama‑OCR tool, which leverages the Llama 3.2‑Vision multimodal model to perform high‑quality OCR, covering installation of Ollama, the vision model, the OCR package, and example code for plain‑text and Markdown outputs.

Llama 3.2-VisionNode.jsOCR

0 likes · 6 min read

Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code

Bilibili Tech

Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIGame RecognitionMultimodal

0 likes · 14 min read

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Architect

Nov 2, 2024 · Frontend Development

How to Build Robust Dark Watermarks and Boost OCR Accuracy in Web Apps

This article walks through the evolution of watermark techniques, demonstrates how to harden a front‑end watermark against deletion, invisibility, and covering using MutationObserver and canvas, introduces a low‑visibility dark watermark with decode logic, and details OCR integration and optimization to improve recognition accuracy in screenshot‑search scenarios.

CanvasImage processingMutationObserver

0 likes · 21 min read

How to Build Robust Dark Watermarks and Boost OCR Accuracy in Web Apps