Artificial Intelligence 12 min read

Convert Any PDF to Clean Markdown with a Local LLM (Gemma 3)

Learn how to transform any PDF—including scanned documents—into well‑structured Markdown using a local LLM (Gemma 3 via Ollama), Python, PyMuPDF and Pillow, without cloud APIs or API keys, by converting pages to images, prompting the model, and saving the output.

Code Mala Tang

Jul 22, 2025

Convert Any PDF to Clean Markdown with a Local LLM (Gemma 3)

Ever struggled with a PDF that is full of embedded images, random formatting, or scanned pages that produce garbled text when copied? This guide shows how to use a local LLM (Gemma 3 via Ollama) to convert any PDF into clean, structured Markdown without cloud APIs, API keys, or privacy concerns.

What We Are Doing

Convert each PDF page to an image.

Send the images to a local LLM (gemma3:12b or gemma3:4b) via Ollama.

Ask the model to extract readable content and format it as Markdown.

Save the result to a .md file you can use directly.

This works for scanned PDFs as well because the input is image‑based, giving you OCR, layout detection, and formatting in one step.

Tools We Use

PyMuPDF (fitz) : renders PDF pages to images.

Pillow : converts raw image data to PNG bytes.

ollama : interacts with the local model (no OpenAI key needed).

gemma3:12b (or gemma3:4b) : a privacy‑respecting multimodal model that can process images.

Install everything you need:

curl -LsSf https://astral.sh/uv/install.sh | sh

wget -qO- https://astral.sh/uv/install.sh | sh

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

uv init pdftomd
cd pdftomd

uv pip install pymupdf pillow ollama

Make sure Ollama is installed and running, and that you have pulled the Gemma model:

ollama run gemma3:12b

ollama run gemma3:4b

Code

Run the following commands inside a uv environment: uv run python If you have a script, execute: uv run myscript.py Below is the core Python code.

import fitz
import ollama
import io
from PIL import Image

def convert_pdf_to_images(pdf_path):
    images = []
    doc = fitz.open(pdf_path)
    for page_num in range(len(doc)):
        pix = doc[page_num].get_pixmap()
        img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
        img_buffer = io.BytesIO()
        img.save(img_buffer, format="PNG")
        images.append(img_buffer.getvalue())
    return images

prompt = "Extract all readable text from these images and format it as structured Markdown."

def query_llm_with_images(image_bytes_list, model="gemma3:12b", prompt=prompt):
    response = ollama.chat(
        model=model,
        messages=[
            {
                "role": "user",
                "content": prompt,
                "images": image_bytes_list
            }
        ]
    )
    return response["message"]["content"]

pdf_path = "mypdf.pdf"
images = convert_pdf_to_images(pdf_path)
if images:
    print(f"Converted {len(images)} pages to images.")
    extracted_text = query_llm_with_images(images)
    with open("output.md", "w", encoding="utf-8") as md_file:
        md_file.write(extracted_text)
    print("
Markdown Conversion Complete! Check `output.md`.")
else:
    print("No images found in the PDF.")

Why use raw bytes? Ollama accepts them directly, avoiding disk I/O for faster, cleaner processing.

Benefits You Get

Markdown output – ready for LLM pipelines, knowledge bases, or human reading.

Supports scanned PDFs – thanks to the image‑based approach.

Privacy by default – all inference runs locally.

Elegant simplicity – no bulky OCR tools or fragile PDF parsers.

Other Use Cases

Convert old scanned textbooks to Markdown for fine‑tuning models.

Build an offline document Q&A system with local embeddings.

Feed converted documents into a retrieval‑augmented chatbot.

Summarize meeting notes, scientific papers, or financial reports.

Create a Markdown knowledge base from PDFs, Word files, etc.

Final Thoughts

Complexity isn’t always power; often the key is removing friction. This workflow is fast, local, and intelligent – it can turn virtually any PDF into Markdown with minimal effort, in seconds, and without relying on the cloud.

For more advanced scenarios you can tweak prompts, request JSON output, or even ask the model to translate the extracted text.

When processing many pages at once you may exceed the model’s context window, causing truncation or failure. Processing page‑by‑page avoids this problem.

Here is an improved, memory‑efficient version that yields one page at a time:

import fitz
import ollama
import io
from PIL import Image

def convert_pdf_to_images(pdf_path):
    """Yield PNG bytes for each page to avoid large lists."""
    doc = fitz.open(pdf_path)
    for page_num in range(len(doc)):
        pix = doc[page_num].get_pixmap()
        img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
        buf = io.BytesIO()
        img.save(buf, format="PNG")
        yield page_num + 1, buf.getvalue()

def query_llm_with_image(image_bytes, model="gemma3:12b", prompt=None):
    if prompt is None:
        prompt = "Extract all readable text from this image and format it as structured Markdown."
    response = ollama.chat(
        model=model,
        messages=[
            {"role": "user", "content": prompt, "images": [image_bytes]}
        ]
    )
    return response["message"]["content"]

def extract_pdf_to_markdown(pdf_path, output_file="output.md", prompt=None):
    with open(output_file, "w", encoding="utf-8") as md_file:
        for page_number, image_bytes in convert_pdf_to_images(pdf_path):
            print(f"Processing page {page_number}...")
            try:
                markdown = query_llm_with_image(image_bytes, prompt=prompt)
                md_file.write(f"

## Page {page_number}

")
                md_file.write(markdown)
            except Exception as e:
                print(f"Error processing page {page_number}: {e}")
                md_file.write(f"

## Page {page_number} (Error)

_Error extracting content from this page._")
    print(f"
 Done! Markdown saved to {output_file}")

if __name__ == "__main__":
    pdf_path = "mypdf.pdf"
    out_path = "output.md"
    prompt = ("Extract all readable text and text chunks from this image"
              " and format it as structured Markdown."
              " Look in the entire image always and try to retrieve all text!")
    extract_pdf_to_markdown(pdf_path, output_file=out_path, prompt=prompt)

The choice of gemma3:12b or gemma3:4b is illustrative; they run on most laptops or PCs without overheating the GPU, support image input via Ollama, and are a good experimental starting point.

In practice, pick the largest locally‑supported model for the best results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python LLM PDF Ollama Markdown Gemma

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.