Artificial Intelligence 8 min read

How BabelDOC Preserves PDF Layout While Translating & OneAIFW Shields Your Data

Two open‑source projects—BabelDOC, a Python‑based PDF translator that retains original formatting using AI models, and OneAIFW, a Zig‑and‑Rust local AI firewall that anonymizes sensitive data before LLM queries—offer practical, privacy‑preserving solutions for researchers and developers.

Old Meng AI Explorer

Jan 18, 2026

How BabelDOC Preserves PDF Layout While Translating & OneAIFW Shields Your Data

BabelDOC: PDF translation with layout preservation

BabelDOC is a Python‑based open‑source utility that parses the structural elements of a PDF (titles, body text, figures, captions, formulas and tables), translates the extracted text with large language models, and reinserts the translated strings into their original positions. The process preserves the original pagination, column layout and visual formatting, making it suitable for academic papers and technical reports.

Key technical features

Dual‑online pipeline: structural parsing and LLM‑based translation run concurrently.

Bilingual side‑by‑side view aligns the source language on the left with the target language on the right.

Supports OpenAI‑compatible APIs (e.g., GPT‑4o, DeepSeek, Qwen) for high‑quality, domain‑aware translation.

Installation

Clone the repository and inspect the CLI help:

git clone https://github.com/funstory-ai/BabelDOC
cd BabelDOC
uv run babeldoc --help

Install the package either with uv or pip: uv tool install babeldoc or

pip install babeldoc

Translation command example (DeepSeek model)

babeldoc \
  --files paper.pdf \
  --openai \
  --openai-model "deepseek-chat" \
  --openai-base-url "https://api.deepseek.com" \
  --openai-api-key "sk-YOUR_KEY" \
  --lang-out zh-CN

Repository: https://github.com/funstory-ai/BabelDOC (latest release v0.5.22, AGPL‑3.0 license, >480 k PyPI downloads)

OneAIFW: Local AI firewall for zero data leakage

OneAIFW is a lightweight open‑source AI firewall written in Zig and Rust. It intercepts outgoing LLM requests, replaces detected personally identifiable information (PII) with unique placeholders, forwards the anonymized prompt, and restores the original values in the model’s response. All processing occurs locally, preventing raw sensitive data from leaving the host.

Core principle

Before a prompt is sent, the engine scans for entities such as email addresses, phone numbers, bank card numbers and cryptographic keys. Each match is substituted with a token like __PII_EMAIL_ADDRESS_00000001__. After the LLM returns a reply, the firewall post‑processes the text, replacing each token with the original value.

Sensitive data detection

The detector recognises multiple PII categories with confidence scores up to 90 %. In a test string containing an email, a phone number and a bank card number, all three entities are correctly identified and mapped to placeholders.

Architecture

Core engine built with Zig + Rust, supporting both native execution and WebAssembly.

Language bindings: JavaScript (libs/aifw-js) and Python (libs/aifw-py).

Demo applications include a web UI, a browser extension, and backend services based on Presidio/LiteLLM.

Quick start guide

Clone the repository: git clone https://github.com/funstory-ai/aifw.git && cd aifw Build the core library: zig build Install JavaScript dependencies for the demo: pnpm -w install Build the JavaScript package: pnpm -w --filter @oneaifw/aifw-js build Run the web demonstration (open the printed local URL in a browser): cd apps/webapp && pnpm dev Start the backend service or CLI as described in py-origin/README.md: python -m aifw launch Repository: https://github.com/funstory-ai/aifw (MIT license)

large language models data protection document processing AI privacy PDF translation open-source tools

Written by

Old Meng AI Explorer

Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.