A New Era of OCR: Introducing the Powerful xParse Skills for Seamless Document Parsing
This article introduces TextIn's xParse Skills, a zero‑code, high‑accuracy OCR and document‑parsing solution that handles PDFs, images and over 20 other formats with a free daily quota, integrates with LLM agents, and provides detailed installation, command‑line usage, and pros‑cons analysis.
Knowledge Management Missing Piece
Karpathy’s knowledge‑management method stores all raw material in a raw/ folder and gradually compiles it into a structured wiki with an LLM. In practice, raw material also includes PDFs, PPTs, Excel files, and Word documents, which often break when fed directly to a large model.
Why Parsing Matters
Incorrect parsing produces garbled output – “garbage in, garbage out”. The author compared open‑source OCR solutions (DeepSeek‑OCR, HunyuanOCR, PaddleOCR, GLM‑OCR, MinerU) and recommends the commercial tool TextIn xParse based on extensive testing.
Format Support
Free tier (1,000 pages per day) supports PDF and image formats (JPG, PNG, BMP, TIFF, WebP) up to 10 MB per file, 1 request per second.
After configuring credentials, the full suite unlocks Word, Excel, PPT, HTML, OFD, RTF and over 20 other formats, with a per‑file limit of 500 MB and no daily page cap.
Skills repository:
github.com/intsig-textin/xparse-skillsCore Components
SKILL.md – tells the Agent when to trigger document parsing and how to route the request.
xparse-cli – a cross‑platform Go binary that calls the TextIn xParser API.
Workflow:
User says a sentence → Agent detects a document task → Triggers xparse-parse Skill → Calls xparse-cli → Uses free or paid API based on credentials → Returns Markdown or JSON. No code needs to be written.
Installation Methods
Method 1: One‑line dialog install
帮我从技能市场安装 intsig-textin/xparse-parserMethod 2: npx command (recommended) npx skills add intsig-textin/xparse-skills Credential configuration: xparse-cli auth Enter the App ID and Secret Code; they are saved to ~/.xparse-cli/config.yaml. For CI/CD, set environment variables:
export XPARSE_APP_ID=your_app_id
export XPARSE_SECRET_CODE=your_secret_codeUsage
After installing the Skill on platforms such as OpenClaw or Claude Code, natural‑language commands trigger the full parsing pipeline. Example commands:
"Help me read this PDF contract and extract key clauses"
"Convert this report to Markdown and save it to the desktop"
"The encrypted PDF password is 123456, parse the first 10 pages"
"Extract the table image content and output JSON"
Core Commands
# Basic PDF parsing, output Markdown to terminal
xparse-cli parse report.pdf
# Output structured JSON
xparse-cli parse report.pdf --view json
# Save to directory (auto‑named report.md / report.json)
xparse-cli parse report.pdf --output ./result/
# Save to a specific file
xparse-cli parse report.pdf --output parsed.md
# Parse specific page ranges (supports multiple segments)
xparse-cli parse report.pdf --page-range 1-5
xparse-cli parse report.pdf --page-range 1-2,5-10
# Parse encrypted PDF
xparse-cli parse secret.pdf --password mypassword
# Include character‑level coordinates and confidence (useful for manual verification)
xparse-cli parse report.pdf --view json --include-char-details --output ./parsed.jsonThe CLI enables full parsing out of the box; the only optional flag is --include-char-details because it significantly increases response size.
Capabilities
标题层级 :自动识别文档结构,最多5级标题
表格结构 :HTML格式保留单元格层级
图片提取 :内嵌图片识别和提取
目录树 :自动生成文档TOC
分页结果 :页面级元数据
Advanced Playbooks
1. Pipe output directly to an LLM
# Search for a keyword after parsing
xparse-cli parse report.pdf | grep "revenue"
# Feed parsed content to an LLM for summarisation
xparse-cli parse paper.pdf | llm "summarize this paper"2. Batch processing
# Prepare a file list (one path per line)
xparse-cli parse --list files.txt --output ./results/3. Download images from parsed JSON
# Parse to JSON
xparse-cli parse report.pdf --view json --output result.json
# Batch‑download images
xparse-cli download --from result.json --output ./images/4. Private deployment
# Use a private TextIn server
xparse-cli parse report.pdf --base-url https://your-private-server.comEvaluation
✅ Zero‑code, zero‑threshold: speak to an Agent and get results, suitable for any skill level.
✅ Strong complex‑table handling: cross‑page stitching, merged cells, unlimited tables.
✅ Free quota sufficient for light use: 1,000 PDF + image pages per day.
✅ Pipeline and batch support: integrates with LLMs and scripts for automation.
⚠️ Word/PPT/Excel require a paid plan; free tier only supports PDF and images.
⚠️ Free tier limits files to 10 MB; larger PDFs need a paid account.
Suitable Scenarios
Personal knowledge management with many PDFs, Word, and PPT files.
Building high‑precision RAG knowledge bases that need accurate document structure.
Daily work that involves parsing contracts, financial reports, or research papers.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
