Tagged articles

Table Extraction

7 articles · Page 1 of 1

Jun 7, 2026 · Artificial Intelligence

Python OCR Table Extraction: Boost Accuracy from 95% to 99% with Batch Processing

The article explains why generic OCR struggles with structured tables, proposes a partition‑based fixed‑region recognition method using PaddleOCR, provides a complete Python script for batch processing, and demonstrates how this approach consistently achieves over 99% accuracy.

Batch ProcessingOCRPaddleOCR

0 likes · 4 min read

Python OCR Table Extraction: Boost Accuracy from 95% to 99% with Batch Processing

James' Growth Diary

May 13, 2026 · Artificial Intelligence

Multimodal RAG: A Complete Guide to Ingesting Images, Tables, and PDFs

This article examines the blind spot of pure‑text RAG for visual content, compares three multimodal ingestion strategies—CLIP embeddings, image‑to‑text captioning with a MultiVectorRetriever, and ColPali visual retrieval—covers table‑specific handling, presents end‑to‑end TypeScript implementations, and lists common pitfalls to avoid when deploying production‑grade multimodal RAG pipelines.

CLIPColPaliImage Captioning

0 likes · 22 min read

Multimodal RAG: A Complete Guide to Ingesting Images, Tables, and PDFs

Wu Shixiong's Large Model Academy

Mar 22, 2026 · Artificial Intelligence

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

This article examines MinerU’s strengths and nine critical shortcomings—such as reading order errors, split tables, merged cells, OCR misrecognition, formula handling, heading hierarchy loss, output inconsistency, hardware limits, and licensing issues—and provides concrete improvement strategies and interview‑ready talking points for engineers.

Document ParsingInterview TipsMinerU

0 likes · 12 min read

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

Continuous Delivery 2.0

Sep 11, 2025 · Artificial Intelligence

Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions

This article shares practical lessons from building a large‑scale enterprise RAG system, covering imperfect data, document quality scoring, hierarchical chunking, metadata design, semantic‑search failures, open‑source model choices, and table handling to achieve reliable AI‑driven search.

Enterprise AIMetadataRAG

0 likes · 13 min read

Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions

Full-Stack Cultivation Path

Jul 15, 2024 · Fundamentals

Open-Source PDF Table Extraction with Camelot: Quick‑Start Guide

This article explains why extracting tables from PDFs is a common bottleneck, introduces the open‑source Camelot library, walks through installing Ghostscript and Camelot, shows a minimal Python script to convert PDFs to CSV, handles a typical runtime error, and demonstrates the companion Excalibur web UI for interactive extraction.

CamelotExcaliburPDF extraction

0 likes · 5 min read

Open-Source PDF Table Extraction with Camelot: Quick‑Start Guide

Open Source Linux

Jan 10, 2022 · Fundamentals

Extract PDF Tables in 3 Lines with Camelot: A Python Guide

Camelot is a Python library that lets you pull tables from PDF files into Pandas DataFrames with just a few lines of code, offering a fast and reliable solution for researchers and developers who need to convert PDF‑embedded tables into usable data.

CLICamelotPDF extraction

0 likes · 4 min read

Extract PDF Tables in 3 Lines with Camelot: A Python Guide

Python Programming Learning Circle

Oct 10, 2019 · Fundamentals

Extract PDF Tables in Minutes with Camelot: A Simple Python Guide

This article explains how the Python library Camelot can quickly extract tables from PDF files, convert them into pandas DataFrames, and export the data to various formats, while also covering installation options and providing a concise code example.

CamelotPDFPandas

0 likes · 4 min read

Extract PDF Tables in Minutes with Camelot: A Simple Python Guide