Fundamentals 12 min read

Introduction and Usage Guide for PyMuPDF (Python Bindings for MuPDF)

This article provides a comprehensive overview of PyMuPDF, covering its relationship to MuPDF, core features, installation methods, import conventions, and detailed usage examples for opening documents, handling pages, extracting text and images, and performing PDF-specific operations such as merging, splitting, and saving.

Python Programming Learning Circle

Nov 30, 2023

Introduction and Usage Guide for PyMuPDF (Python Bindings for MuPDF)

PyMuPDF is the Python binding for the lightweight MuPDF library, which serves as a PDF, XPS, and e‑book viewer supporting formats like PDF, XPS, OpenXPS, CBZ, EPUB, and FB2.

Key features include decryption, metadata access, rendering pages as raster (PNG) or vector (SVG), text search, extraction of text and images, conversion to other formats, and extensive PDF manipulation such as creating, merging, splitting, inserting, deleting, and re‑ordering pages.

Installation can be done via source or pre‑built wheels from PyPI; the package works on Windows, Linux, and macOS for Python 3.6‑3.9 (64‑bit and 32‑bit where available). Optional dependencies like Pillow, fontTools, and pymupdf-fonts enable additional functionality.

Typical usage starts with importing the library using import fitz, then opening a document with doc = fitz.open(filename). The Document object provides methods such as page_count, metadata, load_page(), get_toc(), and save().

Page handling is central: load a page via page = doc.load_page(pno) or page = doc[pno], then use methods like page.get_pixmap() for raster images, page.get_svg_image() for vector output, page.get_text(opt) for various text extraction formats, and page.search_for("text") to locate strings.

PDF‑specific operations include Document.insert_pdf() to merge documents, Document.delete_page() and Document.delete_pages() to remove pages, Document.insert_page() or Document.new_page() to add pages, and Document.save(incremental=True) for fast incremental updates.

Additional utilities cover extracting links ( page.get_links()), annotations ( page.annots()), and form fields ( page.widgets()), as well as converting any supported document to PDF via Document.convert_to_pdf().

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python PDF Library Document processing MuPDF PyMuPDF

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.