Tagged articles
22 articles
Page 1 of 1
SuanNi
SuanNi
Apr 30, 2026 · Artificial Intelligence

Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud

This guide explains how to use Baidu's open‑source PaddleOCR engine—its full OCR and layout analysis pipeline, multi‑language support, and output formats—to set up a continuously running document recognition service on the 算网 GPU cloud platform, including environment preparation, model configuration, and inference execution.

Document ProcessingGPUMagicMind
0 likes · 6 min read
Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud
Java Tech Enthusiast
Java Tech Enthusiast
Mar 7, 2026 · Artificial Intelligence

Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation

This article introduces several open‑source AI Skills—including Remotion, YouTube‑clipper, skill‑from‑masters, NotebookLM, Markdown‑to‑X publisher, and Anthropic's Agent Skills—detailing their purpose, core features, installation commands, and repository links for developers seeking automation solutions.

ClaudeDocument ProcessingVideo Generation
0 likes · 7 min read
Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 28, 2026 · Artificial Intelligence

RAG-Anything: A Universal RAG Framework for PDFs, Office Docs, and Images

RAG-Anything is an open-source, end-to-end multimodal RAG framework that ingests PDFs, Office files, images, and scientific papers, parses them with high fidelity using MinerU, builds a multimodal knowledge graph, and enables hybrid retrieval, while noting resource and dependency considerations.

AIDocument ProcessingKnowledge Base
0 likes · 7 min read
RAG-Anything: A Universal RAG Framework for PDFs, Office Docs, and Images
Old Meng AI Explorer
Old Meng AI Explorer
Jan 18, 2026 · Artificial Intelligence

How BabelDOC Preserves PDF Layout While Translating & OneAIFW Shields Your Data

Two open‑source projects—BabelDOC, a Python‑based PDF translator that retains original formatting using AI models, and OneAIFW, a Zig‑and‑Rust local AI firewall that anonymizes sensitive data before LLM queries—offer practical, privacy‑preserving solutions for researchers and developers.

AI privacyData ProtectionDocument Processing
0 likes · 8 min read
How BabelDOC Preserves PDF Layout While Translating & OneAIFW Shields Your Data
DataFunSummit
DataFunSummit
Oct 30, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

This article explores how the explosion of unstructured data exposes the limits of traditional OCR and shows how emerging multimodal large language models provide end‑to‑end document understanding, reduce pipeline complexity, cut training costs, enable hybrid retrieval‑augmented generation, and drive real‑world industry deployments.

AIDocument ProcessingOCR
0 likes · 28 min read
How Multimodal Large Models Are Revolutionizing Document Processing and OCR
Old Meng AI Explorer
Old Meng AI Explorer
Oct 30, 2025 · Artificial Intelligence

How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds

This article explains how PaddleOCR, an open‑source OCR engine from Baidu, achieves high‑accuracy text extraction from handwritten notes, scanned PDFs, invoices, IDs and multilingual documents, offering offline cross‑platform support, free commercial use, and step‑by‑step guidance for rapid deployment.

AutomationDocument ProcessingOCR
0 likes · 10 min read
How PaddleOCR Turns Handwritten Notes and PDFs into Editable Text in Seconds
DataFunSummit
DataFunSummit
Jul 23, 2025 · Artificial Intelligence

Multimodal RAG: Techniques, Challenges, and Scaling the Future of AI

This article presents a comprehensive overview of multimodal Retrieval‑Augmented Generation (RAG), detailing three implementation paths—semantic extraction, Transformer‑based, and Visual Language Model approaches—along with scaling strategies using tensor indexing, performance comparisons, and guidance on selecting the most suitable technical route.

AI RetrievalDocument ProcessingMultimodal RAG
0 likes · 12 min read
Multimodal RAG: Techniques, Challenges, and Scaling the Future of AI
Sohu Tech Products
Sohu Tech Products
Jan 8, 2025 · Artificial Intelligence

Multimodal RAG: Implementation Paths and Development Prospects

The talk outlines Multimodal RAG implementation routes, comparing OCR‑based object recognition, transformer encoder‑decoder encoding, and Visual Language Model processing, explains the ColPali late‑interaction method for multi‑dimensional vector matching, addresses scaling tensors with binarization and reranking, and recommends a hybrid long‑term strategy where VLM excels on abstract imagery while traditional OCR remains valuable.

ColPaliDocument ProcessingMultimodal RAG
0 likes · 10 min read
Multimodal RAG: Implementation Paths and Development Prospects
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 11, 2024 · Artificial Intelligence

How to Extract Multimodal File Information with AI on Alibaba Cloud

This tutorial walks you through using Alibaba Cloud's Bailei AI service to deploy a web service that extracts text, images, audio, and video information from multimodal documents, covering resource setup, application deployment, and step‑by‑step extraction examples.

AIAlibaba CloudDocument Processing
0 likes · 5 min read
How to Extract Multimodal File Information with AI on Alibaba Cloud
Lobster Programming
Lobster Programming
Nov 1, 2024 · Backend Development

How to Parse PDFs and Extract Metadata with Apache Tika and Spring Boot

This guide explains Apache Tika's document parsing capabilities, shows how to download and run the Tika app, demonstrates extracting text and metadata from a PDF, and provides step‑by‑step instructions for integrating Tika into a Spring Boot project with full code examples.

Apache TikaDocument ProcessingJava
0 likes · 7 min read
How to Parse PDFs and Extract Metadata with Apache Tika and Spring Boot
Python Programming Learning Circle
Python Programming Learning Circle
Feb 18, 2024 · Backend Development

Introduction, Installation, and Usage of PyMuPDF (Python Bindings for MuPDF)

This article provides a comprehensive overview of PyMuPDF, covering its purpose as Python bindings for the lightweight MuPDF viewer, detailed installation instructions, essential dependencies, naming conventions, and extensive usage examples for opening documents, accessing pages, extracting text and images, manipulating PDFs, and saving changes.

Document ProcessingLibraryMuPDF
0 likes · 12 min read
Introduction, Installation, and Usage of PyMuPDF (Python Bindings for MuPDF)
Python Programming Learning Circle
Python Programming Learning Circle
Nov 30, 2023 · Fundamentals

Introduction and Usage Guide for PyMuPDF (Python Bindings for MuPDF)

This article provides a comprehensive overview of PyMuPDF, covering its relationship to MuPDF, core features, installation methods, import conventions, and detailed usage examples for opening documents, handling pages, extracting text and images, and performing PDF-specific operations such as merging, splitting, and saving.

Document ProcessingLibraryMuPDF
0 likes · 12 min read
Introduction and Usage Guide for PyMuPDF (Python Bindings for MuPDF)
DataFunTalk
DataFunTalk
Nov 10, 2022 · Artificial Intelligence

A Comprehensive Overview of OCR Technology Development and Engineering Practices

This article reviews the 40‑year evolution of Optical Character Recognition, discusses its integration with Intelligent Document Processing, outlines recent research hotspots such as scene text recognition and domain‑specific symbol detection, and shares practical engineering experiences and future directions from Datagrand.

Document ProcessingIntelligent Document ProcessingOCR
0 likes · 24 min read
A Comprehensive Overview of OCR Technology Development and Engineering Practices
Laiye Technology Team
Laiye Technology Team
Jul 16, 2022 · Artificial Intelligence

Seal (Stamp) Recognition in Intelligent Document Processing: Challenges, Methods, and Experiments

This article explains how intelligent document processing uses deep‑learning‑based seal detection and OCR techniques—enhanced YOLOv5, multi‑label loss, combined NMS, and end‑to‑end models such as Mask‑TextSpotter, ABCNet, PGNet, and TrOCR—to overcome diverse stamp styles, background interference, and image quality issues, presenting experimental results that surpass commercial OCR vendors.

AIDocument ProcessingOCR
0 likes · 13 min read
Seal (Stamp) Recognition in Intelligent Document Processing: Challenges, Methods, and Experiments
Python Programming Learning Circle
Python Programming Learning Circle
Jan 7, 2022 · Fundamentals

Using python-docx: Document Structure and Basic Operations

This article introduces the python‑docx library, explains its document model—including Document, Paragraph, Run, and Table objects—and provides practical Python code examples for creating, modifying, and styling Word documents, inserting headings, page breaks, tables, and images.

Code ExampleDocument ProcessingWord Automation
0 likes · 6 min read
Using python-docx: Document Structure and Basic Operations
Programmer DD
Programmer DD
Jul 10, 2020 · Fundamentals

How Search Engines Work: Inside Document and Query Processing

This article explains the core components of a search engine—document processing, query processing, and matching—detailing each step from indexing to ranking, and discusses the document features that influence relevance, providing a comprehensive overview of information retrieval fundamentals.

Document ProcessingQuery Processinginformation retrieval
0 likes · 20 min read
How Search Engines Work: Inside Document and Query Processing
Architect
Architect
Jun 22, 2020 · Fundamentals

Fundamentals of Search Engine Architecture: Document Processing, Query Processing, Indexing, and Matching

This article explains the core components and processing steps of a search engine—document processor, query processor, indexing, and matching—detailing how documents are normalized, tokenized, filtered, weighted, and stored in an inverted index to support effective information retrieval.

Document ProcessingQuery Processinginformation retrieval
0 likes · 20 min read
Fundamentals of Search Engine Architecture: Document Processing, Query Processing, Indexing, and Matching
Python Programming Learning Circle
Python Programming Learning Circle
Oct 25, 2019 · Backend Development

Automate Word with Python: Master win32com for Document Manipulation

This tutorial explains how to use Python's win32com library to control Microsoft Word, covering installation, creating and displaying documents, working with Selection, Range, Font, ParagraphFormat, PageSetup and Styles objects, and providing a complete example that formats a document to meet national standards.

COMDocument ProcessingPython automation
0 likes · 14 min read
Automate Word with Python: Master win32com for Document Manipulation