Automate Bilingual eBook Translation with Python and DeepL API
This tutorial shows how to extract Kindle eBooks, convert them to HTML, clean the markup with BeautifulSoup, and batch‑translate the content line‑by‑line using DeepL API Pro, producing a bilingual eBook with minimal manual effort.
Introduction
The author, a Python enthusiast, shares a small project found on GitHub that automates the translation of eBook text from Chinese to English (or vice‑versa) using the DeepL API. The workflow covers extracting the eBook, converting formats, cleaning HTML, and submitting each line for translation.
eBook Extraction and Format Conversion
First, the Kindle eBook is exported and its DRM removed with ePubor Ultimate , producing an .azw file that is then converted to .epub. The Calibre tool converts the .epub into an .htmlz archive, which is unpacked with the unzip command.
Why Use HTML for Translation
Preserves footnotes, endnotes, and hyperlinks.
DeepL’s tag_handling="xml" parameter correctly processes HTML tags.
CSS can control display styles flexibly.
JavaScript can be used to show language‑specific content.
The cleaned HTML can be converted to any eBook format later.
Cleaning HTML with BeautifulSoup
BeautifulSoup, originally a web‑scraping library, is employed to tidy the HTML. The script removes stray newlines, inserts blank lines before headings, <div>, and <p> tags, and writes the cleaned file.
import bs4
import re
path = "John Law/" # folder name ends with /
source_filename = "index.html"
target_filename = "index2.html"
html = open(path+source_filename)
htmltext = html.read()
soup = bs4.BeautifulSoup(htmltext)
# Remove all
htmltext = str(bs4.BeautifulSoup(htmltext)).replace("
", "")
# Add blank lines before tags
pttn = r'<h'
rpl = r'
<h'
htmltext = re.sub(pttn, rpl, htmltext)
pttn = r'<div'
rpl = r'
<div'
htmltext = re.sub(pttn, rpl, htmltext)
pttn = r'</div>'
rpl = r'
</div>'
htmltext = re.sub(pttn, rpl, htmltext)
pttn = r'<p'
rpl = r'
<p'
htmltext = re.sub(pttn, rpl, htmltext)
fileSave = open(path+target_filename, "w")
fileSave.write(htmltext)
print(htmltext)Translating Line‑by‑Line with DeepL API Pro
The cleaned HTML is read line by line. Each line is sent to DeepL via a GET request with tag_handling="xml". The script retries on connection errors, skips lines that do not need translation, and adds language‑specific CSS classes ( en and cn) to the original and translated lines.
import re
import requests
auth_key = "<your DeepL API Pro authentication key>"
target_language = "ZH"
path = "John Law/"
source_filename = "index2.html"
target_filename = "index3.html"
def translate(text):
result = requests.get(
"https://api.deepl.com/v2/translate",
params={
"auth_key": auth_key,
"target_lang": target_language,
"text": text,
"tag_handling": "xml",
},
)
return result.json()["translations"][0]["text"]
def add_language_tag_en(html):
pttn = re.compile(r'^<(.*?) class="(.*?)">', re.M)
rpl = r'<\1 class="\2 en">'
return re.sub(pttn, rpl, html)
def add_language_tag_cn(html):
pttn = re.compile(r'^<(.*?) class="(.*?)">', re.M)
rpl = r'<\1 class="\2 cn">'
return re.sub(pttn, rpl, html)
lines = open(path+source_filename, "r").readlines()
new_lines = []
line_count = 0
startline = 16
endline = 4032
for line in lines:
line_count += 1
if line_count < startline or line_count > endline or line.strip() == '':
new_lines.append(line)
continue
succeeded = False
while not succeeded:
try:
line_translated = translate(line)
line_translated = line_translated.replace("
", "")
succeeded = True
except:
succeeded = False
if line.strip() == line_translated.strip():
new_lines.append(line)
else:
line = add_language_tag_en(line)
line_translated = add_language_tag_cn(line_translated)
new_lines.append(line)
new_lines.append(line_translated)
with open(path+target_filename, 'w') as f:
f.write("
".join(new_lines))Result
After running the scripts, the original HTML file is transformed into a bilingual version where each Chinese line is followed by its English translation, ready to be repackaged into an eBook.
Another screenshot shows the final translated output.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
