Tagged articles

pdf-processing

8 articles · Page 1 of 1

Jul 27, 2026 · Backend Development

Why More Developers Choose Apache PDFBox for PDF Processing

This article provides a comprehensive guide to Apache PDFBox, covering its licensing advantages, architecture, installation, core operations such as creating, editing, extracting, merging, encrypting PDFs, migration tips from 2.x to 3.x, performance considerations, and recommended use cases for Java backend development.

apachejavaopen-source

0 likes · 17 min read

Why More Developers Choose Apache PDFBox for PDF Processing

Su San Talks Tech

Jul 21, 2026 · Backend Development

Why Apache PDFBox Is Becoming the Go-To Java PDF Library

This article provides a comprehensive guide to Apache PDFBox, covering its licensing advantages, latest version, installation steps, core APIs for creating, editing, extracting, merging, encrypting and signing PDFs, underlying architecture, migration tips, performance considerations, and real‑world use cases.

Apache PDFBoxOpen-source librariesPDF encryption

0 likes · 18 min read

Why Apache PDFBox Is Becoming the Go-To Java PDF Library

Old Zhang's AI Learning

Jul 3, 2026 · Artificial Intelligence

Why Codex’s Office Skills Are Seriously Underrated: Word, Excel, PPT, and PDF All Integrated into Workflows

The author demonstrates how OpenAI Codex can act as a full‑featured office assistant, using plugins to read PDFs, extract data into spreadsheets, draft Word documents, design PowerPoint presentations, and combine everything via Sites and annotations into a seamless, end‑to‑end workflow.

AI agentsOpenAI CodexPowerPoint Generation

0 likes · 8 min read

Why Codex’s Office Skills Are Seriously Underrated: Word, Excel, PPT, and PDF All Integrated into Workflows

Architect's Tech Stack

Jun 5, 2026 · Artificial Intelligence

Open-Source ‘Book-to-Skill’ Turns Technical Books into Claude Code Queries

This article introduces the open‑source “book-to-skill” tool that compiles technical books into a Claude Code skill, enabling terminal queries that return answers from the actual book content while keeping token usage low and avoiding AI hallucinations.

AI assistantCLI toolClaude Code

0 likes · 3 min read

Open-Source ‘Book-to-Skill’ Turns Technical Books into Claude Code Queries

AI Insight Log

Dec 17, 2025 · Artificial Intelligence

Inside ChatGPT’s New ‘Skills’: PDF & Spreadsheet Tools and Adding Them to Cursor

The author demonstrates that OpenAI has quietly integrated Anthropic‑style “Skills” into ChatGPT, exposing a /home/oai/skills directory with PDF and spreadsheet modules, explains how the PDF skill converts files to PNGs for vision‑based reading, and shows how to mount these skills in Cursor for local tool invocation.

AnthropicChatGPTCursor IDE

0 likes · 6 min read

Inside ChatGPT’s New ‘Skills’: PDF & Spreadsheet Tools and Adding Them to Cursor

Ops Development & AI Practice

Apr 13, 2025 · Industry Insights

MarkItDown vs Docling: Which Open‑Source Tool Wins for LLM‑Ready Markdown?

This article provides an in‑depth comparison of Microsoft’s MarkItDown and IBM‑backed Docling, evaluating their supported formats, output options, performance, community backing, and ideal use cases to help developers choose the right tool for AI‑driven document processing.

LLMMarkdowndocument-conversion

0 likes · 8 min read

MarkItDown vs Docling: Which Open‑Source Tool Wins for LLM‑Ready Markdown?

AI Large Model Application Practice

Oct 18, 2023 · Artificial Intelligence

How to Extract and Embed Tables and Images from PDFs for Multimodal RAG

This article explains a practical approach to parsing PDFs containing text, tables, and images, using the open‑source Unstructured library and LlaVA model, then embedding each modality into a vector store with multi‑vector retrieval to enable accurate semantic search in private‑knowledge RAG pipelines, with optional LangChain integration.

LLMLangChainRAG

0 likes · 12 min read

How to Extract and Embed Tables and Images from PDFs for Multimodal RAG

Liangxu Linux

Mar 29, 2022 · Fundamentals

Why the Classic Windows PDF Patcher Is Now Open‑Source: Features, Tech Stack, and License

The long‑standing Chinese Windows utility PDF Patcher, known for its extensive PDF editing capabilities, has been open‑sourced on GitHub, revealing its C# and C codebase, .NET Framework foundation, iText and MuPDF dependencies, and a unique “Conscience License” that adds moral obligations for users and developers.

.NETC#C#

0 likes · 5 min read

Why the Classic Windows PDF Patcher Is Now Open‑Source: Features, Tech Stack, and License