Batch Generating Word Documents from Excel Using Python: Two Solutions with python-docx and zipfile
This article demonstrates how to automate the replacement of placeholders in Word (.doc/.docx) files with data from an Excel sheet using Python, presenting two approaches—one based on python-docx and win32com, and another that manipulates the underlying zip archive with the zipfile module—while addressing common pitfalls and preserving document formatting.
The task is to read rows from an Excel file and replace corresponding placeholders in a Word document, saving each customized document separately. The original Word file is in .doc format, requiring conversion to .docx before manipulation.
Solution 1: Using python-docx and win32com
First, install the required packages:
pip install python-docx pypiwin32 pandasKey steps:
Convert .doc to .docx with win32com.client if needed.
Load the .docx file with Document from python-docx .
Iterate over paragraphs and runs, replacing placeholder strings (e.g., #ColumnName# ) with values from the Excel row.
Save each modified document to a designated folder.
Common issues encountered:
Incorrect package installation (using pip install docx instead of pip install python-docx ).
python-docx cannot handle .doc files directly, requiring conversion.
Text runs may split words, causing replacement failures; using run.text preserves formatting but may lose underline styles.
Sample code (excerpt):
from copy import deepcopy
from pathlib import Path
from win32com import client as wc # pip install pypiwin32
from docx import Document # pip install python-docx
import pandas as pd
def doctransform2docx(doc_path):
docx_path = doc_path + 'x'
suffix = doc_path.split('.')[-1]
assert 'doc' in suffix, '传入的不是word文档,请重新输入!'
if suffix == 'docx':
return Document(doc_path)
word = wc.Dispatch('Word.Application')
doc = word.Documents.Open(doc_path)
doc.SaveAs2(docx_path, 16) # 16 = docx format
doc.Close()
word.Quit()
return Document(docx_path)
def replace_docx(name, values, wordfile, path_name='Company'):
wordfile_copy = deepcopy(wordfile)
for col_name, value in zip(name, values):
if col_name == 'Company':
path_name = str(value)
for paragraph in wordfile_copy.paragraphs:
for run in paragraph.runs:
run.text = run.text.replace(col_name, str(value))
wordfile_copy.save(f'{save_folder}/{path_name}.docx')
if __name__ == '__main__':
doc_path = r"D:\solve_path\单位.doc"
excel_path = r"D:\solve_path\信息.xls"
save_folder = Path('D:/docx_save')
save_folder.mkdir(parents=True, exist_ok=True)
data = pd.read_excel(excel_path)
wordfile = doctransform2docx(doc_path)
data.apply(lambda x: replace_docx(x.index, x.values, wordfile), axis=1)Solution 2: Directly Manipulating the Word Zip Archive with zipfile
Since a .docx file is essentially a zip archive containing XML files, this method extracts word/document.xml , performs plain‑text replacements, and repackages the archive.
No additional third‑party libraries are needed beyond the Python standard library and pandas for Excel handling.
Conversion from .doc to .docx is still performed via win32com .
When recompressing, use zipfile.zlib.DEFLATED to preserve the original folder hierarchy.
Sample code (excerpt):
from shutil import rmtree
import zipfile
from copy import deepcopy
from pathlib import Path
from win32com import client as wc # pip install pypiwin32
import pandas as pd
def docx_unzip(docx_path):
docx_path = Path(docx_path)
unzip_path = docx_path.with_name(docx_path.stem)
with zipfile.ZipFile(docx_path, 'r') as f:
for file in f.namelist():
f.extract(file, path=unzip_path)
xml_path = unzip_path.joinpath('word/document.xml')
with xml_path.open(encoding='utf-8') as f:
xml_file = f.read()
return unzip_path, xml_path, xml_file
def docx_zipped(docx_path, zipped_path):
docx_path = Path(docx_path)
with zipfile.ZipFile(zipped_path, 'w', zipfile.zlib.DEFLATED) as f:
for file in docx_path.glob('**/*.*'):
f.write(file, file.as_posix().replace(docx_path.as_posix() + '/', ''))
def replace_docx(name, values, xml_file, xml_path, unzip_path, path_name='Company'):
xml_file_copy = deepcopy(xml_file)
for col_name, value in zip(name, values):
if col_name == 'Company':
path_name = str(value)
xml_file_copy = xml_file_copy.replace(col_name, str(value))
with xml_path.open('w', encoding='utf-8') as f:
f.write(xml_file_copy)
docx_zipped(unzip_path, f'{save_folder}/{path_name}.docx')
if __name__ == '__main__':
doc_path = r"D:\solve_path\单位.doc"
excel_path = r"D:\solve_path\信息.xls"
save_folder = Path('D:/docx_save')
save_folder.mkdir(parents=True, exist_ok=True)
data = pd.read_excel(excel_path)
docx_path = doctransform2docx(doc_path)
unzip_path, xml_path, xml_file = docx_unzip(docx_path)
data.apply(lambda x: replace_docx(x.index, x.values, xml_file, xml_path, unzip_path), axis=1)
rmtree(unzip_path)Both approaches ultimately achieve the goal of batch‑generating personalized Word documents while preserving most of the original formatting; the zipfile method retains underline and box styles that were lost with the first method.
Conclusion: After several attempts, the author found a workable solution that replaces placeholders without breaking the document layout, though further refinements may still be possible.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.