Translate Full PDFs While Preserving Layout Using LLMs – Core Code Included
This article presents a two‑stage, cache‑enabled pipeline that extracts text blocks from a PDF with PyMuPDF, translates them via a large‑language‑model API, and re‑renders each page as an image with Chinese text overlaid to keep the original layout, along with full Python code and usage instructions.
The author compares three recent PDF‑translation pipelines: Gemini‑3‑Pro extracts text, translates it, and rebuilds a Markdown document before rendering HTML to PDF; Claude‑Opus‑4.5 converts PDF → DOCX → translate → DOCX → PDF; both are simple but lose the original styling. The proposed GPT‑5.2‑Codex solution aims to preserve the original layout by following a “two‑stage + cache” strategy.
Two‑Stage + Cache Strategy
Text extraction and translation stage
Use PyMuPDF to open the PDF and extract each text block’s bounding box, font size, and content.
Send the block text to a model API (e.g., MiniMax‑M2) with a system prompt “Translate English to Simplified Chinese. Keep line breaks. Return JSON.”
Store translations in a JSON cache keyed by block ID to allow interruption and resume.
Re‑layout and export stage
Render each original page to a bitmap (avoiding font‑embedding issues that cause question‑mark glyphs).
Cover the original English block area with a white rectangle.
Scale the Chinese text to fit the original block dimensions and draw it with a suitable CJK font.
Save all processed pages as a new PDF.
The only remaining imperfection is a persistent white background behind the Chinese text, which the author could not eliminate after multiple attempts.
Core Python Implementation
import json
import re
import fitz
import requests
from PIL Image, ImageDraw, ImageFont
def extract_blocks(input_pdf):
doc = fitz.open(str(input_pdf))
blocks = []
for p in range(len(doc)):
d = doc[p].get_text("dict")
for b in d.get("blocks", []):
if b.get("type") != 0:
continue
lines, sizes = [], []
for line in b.get("lines", []):
spans = line.get("spans", [])
t = "".join(s.get("text", "") for s in spans)
if t.strip():
lines.append(t)
for s in spans:
if "size" in s:
sizes.append(s["size"])
text = "
".join(lines).strip()
if not text:
continue
blocks.append({
"page": p,
"bbox": b.get("bbox"),
"text": text,
"font_size": sizes[len(sizes)//2] if sizes else 10
})
return blocks
def call_translate(api_base, model, api_key, items):
payload = {
"model": model,
"messages": [
{"role": "system", "content": "Translate English to Simplified Chinese. Keep line breaks. Return JSON."},
{"role": "user", "content": json.dumps({"items": items}, ensure_ascii=False)}
],
"temperature": 0.2,
"response_format": {"type": "json_object"}
}
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
r = requests.post(f"{api_base.rstrip('/')}/chat/completions", json=payload, headers=headers, timeout=120)
r.raise_for_status()
content = r.json()["choices"][0]["message"]["content"]
m = re.search(r"\{.*\}\s*$", content, re.S)
if m:
content = m.group(0)
return json.loads(content)["items"]
def translate_blocks(blocks, api_base, model, api_key, batch_size=8):
cache = {}
pending = [(i, b["text"]) for i, b in enumerate(blocks)]
for i in range(0, len(pending), batch_size):
batch = pending[i:i+batch_size]
items = [{"id": idx, "text": txt} for idx, txt in batch]
translated = call_translate(api_base, model, api_key, items)
for it in translated:
cache[int(it["id"])] = it["text"]
return cache
def build_pdf(input_pdf, output_pdf, blocks, translations, fontfile, dpi=200, min_font_size=3):
doc = fitz.open(str(input_pdf))
scale = dpi / 72.0
font_cache = {}
def get_font(size):
key = int(size * scale)
if key not in font_cache:
font_cache[key] = ImageFont.truetype(fontfile, key)
return font_cache[key]
pages = []
for p in range(len(doc)):
pix = doc[p].get_pixmap(dpi=dpi, alpha=False)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
draw = ImageDraw.Draw(img)
for idx, b in enumerate(blocks):
if b["page"] != p:
continue
rect = fitz.Rect(b["bbox"])
x0, y0, x1, y1 = rect.x0*scale, rect.y0*scale, rect.x1*scale, rect.y1*scale
draw.rectangle([x0, y0, x1, y1], fill=(255,255,255))
text = translations.get(idx, b["text"]).replace("\t", " ")
size = max(float(b.get("font_size") or 10), min_font_size)
font = get_font(size)
draw.text((x0, y0), text, font=font, fill=(0,0,0))
pages.append(img)
first, rest = pages[0], pages[1:]
first.save(str(output_pdf), "PDF", save_all=True, append_images=rest, resolution=dpi)Usage Instructions
Install dependencies: python3 -m pip install pymupdf pillow requests Set the API key (example): export SILICONFLOW_API_KEY="YOUR_API_KEY" Optional configuration:
export SILICONFLOW_API_BASE="https://api.siliconflow.cn/v1"
export SILICONFLOW_MODEL="MiniMaxAI/MiniMax-M2"Run the script with the minimal command:
python3 -B /path/to/translate_pdf.py \
--input-pdf /absolute/path/to/source.pdfOr specify output and working directories:
python3 -B /path/to/translate_pdf.py \
--input-pdf /absolute/path/to/source.pdf \
--output-pdf /absolute/path/to/source-zh.pdf \
--work-dir /absolute/path/to/tmp/pdfsSpecify a CJK font to improve rendering (recommended):
python3 -B /path/to/translate_pdf.py \
--input-pdf /absolute/path/to/source.pdf \
--font /System/Library/Fonts/Supplemental/Songti.ttcCommon Issues
Question marks or garbled characters – caused by missing Chinese font support; fix by providing a suitable CJK font via --font.
Very small or cramped Chinese text – Chinese strings are usually longer than English, so the original block height may be insufficient; increase --dpi or adjust --min-font-size.
Slow re‑run – the script re‑translates all text; keep the <stem>.translations.json cache file so the script can resume.
English text not fully hidden – the script forces a white background over detected text blocks; any remaining English is likely inside images and not recognized as text.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
