Artificial Intelligence 10 min read

A New Era of OCR: Introducing the Powerful xParse Skills for Seamless Document Parsing

This article introduces TextIn's xParse Skills, a zero‑code, high‑accuracy OCR and document‑parsing solution that handles PDFs, images and over 20 other formats with a free daily quota, integrates with LLM agents, and provides detailed installation, command‑line usage, and pros‑cons analysis.

Old Zhang's AI Learning

Apr 15, 2026

A New Era of OCR: Introducing the Powerful xParse Skills for Seamless Document Parsing

Knowledge Management Missing Piece

Karpathy’s knowledge‑management method stores all raw material in a raw/ folder and gradually compiles it into a structured wiki with an LLM. In practice, raw material also includes PDFs, PPTs, Excel files, and Word documents, which often break when fed directly to a large model.

Why Parsing Matters

Incorrect parsing produces garbled output – “garbage in, garbage out”. The author compared open‑source OCR solutions (DeepSeek‑OCR, HunyuanOCR, PaddleOCR, GLM‑OCR, MinerU) and recommends the commercial tool TextIn xParse based on extensive testing.

Format Support

Free tier (1,000 pages per day) supports PDF and image formats (JPG, PNG, BMP, TIFF, WebP) up to 10 MB per file, 1 request per second.

After configuring credentials, the full suite unlocks Word, Excel, PPT, HTML, OFD, RTF and over 20 other formats, with a per‑file limit of 500 MB and no daily page cap.

Skills repository:

github.com/intsig-textin/xparse-skills

Core Components

SKILL.md – tells the Agent when to trigger document parsing and how to route the request.

xparse-cli – a cross‑platform Go binary that calls the TextIn xParser API.

Workflow:

User says a sentence → Agent detects a document task → Triggers xparse-parse Skill → Calls xparse-cli → Uses free or paid API based on credentials → Returns Markdown or JSON. No code needs to be written.

Installation Methods

Method 1: One‑line dialog install

帮我从技能市场安装 intsig-textin/xparse-parser

Method 2: npx command (recommended) npx skills add intsig-textin/xparse-skills Credential configuration: xparse-cli auth Enter the App ID and Secret Code; they are saved to ~/.xparse-cli/config.yaml. For CI/CD, set environment variables:

export XPARSE_APP_ID=your_app_id
export XPARSE_SECRET_CODE=your_secret_code

Usage

After installing the Skill on platforms such as OpenClaw or Claude Code, natural‑language commands trigger the full parsing pipeline. Example commands:

"Help me read this PDF contract and extract key clauses"

"Convert this report to Markdown and save it to the desktop"

"The encrypted PDF password is 123456, parse the first 10 pages"

"Extract the table image content and output JSON"

Core Commands

# Basic PDF parsing, output Markdown to terminal
xparse-cli parse report.pdf

# Output structured JSON
xparse-cli parse report.pdf --view json

# Save to directory (auto‑named report.md / report.json)
xparse-cli parse report.pdf --output ./result/

# Save to a specific file
xparse-cli parse report.pdf --output parsed.md

# Parse specific page ranges (supports multiple segments)
xparse-cli parse report.pdf --page-range 1-5
xparse-cli parse report.pdf --page-range 1-2,5-10

# Parse encrypted PDF
xparse-cli parse secret.pdf --password mypassword

# Include character‑level coordinates and confidence (useful for manual verification)
xparse-cli parse report.pdf --view json --include-char-details --output ./parsed.json

The CLI enables full parsing out of the box; the only optional flag is --include-char-details because it significantly increases response size.

Capabilities

标题层级：自动识别文档结构，最多5级标题

表格结构：HTML格式保留单元格层级

图片提取：内嵌图片识别和提取

目录树：自动生成文档TOC

分页结果：页面级元数据

Advanced Playbooks

1. Pipe output directly to an LLM

# Search for a keyword after parsing
xparse-cli parse report.pdf | grep "revenue"

# Feed parsed content to an LLM for summarisation
xparse-cli parse paper.pdf | llm "summarize this paper"

2. Batch processing

# Prepare a file list (one path per line)
xparse-cli parse --list files.txt --output ./results/

3. Download images from parsed JSON

# Parse to JSON
xparse-cli parse report.pdf --view json --output result.json

# Batch‑download images
xparse-cli download --from result.json --output ./images/

4. Private deployment

# Use a private TextIn server
xparse-cli parse report.pdf --base-url https://your-private-server.com

Evaluation

✅ Zero‑code, zero‑threshold: speak to an Agent and get results, suitable for any skill level.

✅ Strong complex‑table handling: cross‑page stitching, merged cells, unlimited tables.

✅ Free quota sufficient for light use: 1,000 PDF + image pages per day.

✅ Pipeline and batch support: integrates with LLMs and scripts for automation.

⚠️ Word/PPT/Excel require a paid plan; free tier only supports PDF and images.

⚠️ Free tier limits files to 10 MB; larger PDFs need a paid account.

Suitable Scenarios

Personal knowledge management with many PDFs, Word, and PPT files.

Building high‑precision RAG knowledge bases that need accurate document structure.

Daily work that involves parsing contracts, financial reports, or research papers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CLI LLM OCR Agent document parsing TextIn xParse

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.