Fundamentals 13 min read

PyMuPDF (Python bindings for MuPDF) – Introduction, Features, Installation and Usage Guide

This article provides a comprehensive overview of PyMuPDF, the Python binding for the lightweight MuPDF library, covering its purpose, supported document formats, key features such as rendering, text extraction and PDF manipulation, installation methods, and detailed code examples for common operations.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
PyMuPDF (Python bindings for MuPDF) – Introduction, Features, Installation and Usage Guide

1. Introduction to PyMuPDF

PyMuPDF is the Python interface to MuPDF, a lightweight PDF, XPS, and e‑book viewer library. MuPDF offers high‑quality anti‑aliased rendering, precise text layout, and supports formats like PDF, XPS, OpenXPS, CBZ, EPUB and FictionBook 2. The Python binding (current version 1.18.17) enables access to all MuPDF capabilities.

2. Core Features

Decrypt files

Access metadata, links and bookmarks

Render pages as raster images (PNG, etc.) or vector SVG

Search text

Extract text and images

Convert documents to PDF, (X)HTML, XML, JSON, plain text and more; for PDFs, create, merge or split pages, insert/delete/rearrange pages, and modify annotations and form fields

Extract or insert images and fonts

Full support for embedded files

Reformat PDFs for duplex printing, color separation, watermarks, etc.

Comprehensive password protection handling

Command‑line utility ( python -m fitz … ) with encryption, decryption, optimization, sub‑document creation, document concatenation, and more

3. Installation

Install PyMuPDF via pip install PyMuPDF from PyPI wheels for Windows, Linux and macOS (Python 3.6‑3.9, 64‑bit; 32‑bit wheels are also available for Windows). Optional dependencies such as Pillow, fontTools and pymupdf‑fonts enhance functionality.

4. Basic Usage

Import the library:

import fitz

Check the version:

print(fitz.__doc__)

Open a document (from file or memory):

doc = fitz.open('example.pdf')  # or doc = fitz.open(stream=data, filetype='pdf')

5. Document Methods and Properties

Method/Property

Description

Document.page_count

Number of pages (int)

Document.metadata

Metadata dictionary

Document.get_toc()

Retrieve table of contents (list)

Document.load_page()

Load a specific page

6. Page Handling

Iterate over pages, load a page, and access links, annotations or widgets:

for page in doc:
    # process each page
    links = page.get_links()
    for link in links:
        # handle link
        pass

Render a page to a raster image:

pix = page.get_pixmap()
pix.save('page-%i.png' % page.number)

Render a page to SVG:

svg = page.get_svg_image()

Extract text in various formats ("text", "blocks", "words", "html", "dict", "json", "rawdict", "rawjson", "xhtml", "xml"):

text = page.get_text('text')

Search for a string on a page:

areas = page.search_for('mupdf')

7. PDF Operations

Modify PDFs (create, merge, split, reorder, delete pages) using methods such as Document.delete_page() , Document.copy_page() , Document.move_page() , Document.insert_page() , and Document.new_page() . Save changes with Document.save() , optionally using incremental=True for fast incremental updates.

Combine PDFs:

doc1.insert_pdf(doc2)  # append doc2 to doc1

Split a PDF (first 10 pages and last 10 pages example):

doc2 = fitz.open()
doc2.insert_pdf(doc1, to_page=9)          # first 10 pages
doc2.insert_pdf(doc1, from_page=len(doc1)-10)  # last 10 pages
doc2.save('first-and-last-10.pdf')

Close a document when finished:

doc.close()
PythonPDFdocument processingMuPDFPyMuPDFtext-extraction
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.