Backend Development 11 min read

Batch Generating Word Documents from Excel Using Python: Two Solutions with python-docx and zipfile

This article demonstrates how to automate the replacement of placeholders in Word (.doc/.docx) files with data from an Excel sheet using Python, presenting two approaches—one based on python-docx and win32com, and another that manipulates the underlying zip archive with the zipfile module—while addressing common pitfalls and preserving document formatting.

Python Programming Learning Circle

Dec 11, 2021

Batch Generating Word Documents from Excel Using Python: Two Solutions with python-docx and zipfile

The task is to read rows from an Excel file and replace corresponding placeholders in a Word document, saving each customized document separately. The original Word file is in .doc format, requiring conversion to .docx before manipulation.

Solution 1: Using python-docx and win32com

First, install the required packages: pip install python-docx pypiwin32 pandas Key steps:

Convert .doc to .docx with win32com.client if needed.

Load the .docx file with Document from python-docx.

Iterate over paragraphs and runs, replacing placeholder strings (e.g., #ColumnName#) with values from the Excel row.

Save each modified document to a designated folder.

Common issues encountered:

Incorrect package installation (using pip install docx instead of pip install python-docx). python-docx cannot handle .doc files directly, requiring conversion.

Text runs may split words, causing replacement failures; using run.text preserves formatting but may lose underline styles.

Sample code (excerpt):

from copy import deepcopy
from pathlib import Path
from win32com import client as wc  # pip install pypiwin32
from docx import Document  # pip install python-docx
import pandas as pd

def doctransform2docx(doc_path):
    docx_path = doc_path + 'x'
    suffix = doc_path.split('.')[-1]
    assert 'doc' in suffix, '传入的不是word文档，请重新输入！'
    if suffix == 'docx':
        return Document(doc_path)
    word = wc.Dispatch('Word.Application')
    doc = word.Documents.Open(doc_path)
    doc.SaveAs2(docx_path, 16)  # 16 = docx format
    doc.Close()
    word.Quit()
    return Document(docx_path)

def replace_docx(name, values, wordfile, path_name='Company'):
    wordfile_copy = deepcopy(wordfile)
    for col_name, value in zip(name, values):
        if col_name == 'Company':
            path_name = str(value)
        for paragraph in wordfile_copy.paragraphs:
            for run in paragraph.runs:
                run.text = run.text.replace(col_name, str(value))
    wordfile_copy.save(f'{save_folder}/{path_name}.docx')

if __name__ == '__main__':
    doc_path = r"D:\solve_path\单位.doc"
    excel_path = r"D:\solve_path\信息.xls"
    save_folder = Path('D:/docx_save')
    save_folder.mkdir(parents=True, exist_ok=True)
    data = pd.read_excel(excel_path)
    wordfile = doctransform2docx(doc_path)
    data.apply(lambda x: replace_docx(x.index, x.values, wordfile), axis=1)

Solution 2: Directly Manipulating the Word Zip Archive with zipfile

Since a .docx file is essentially a zip archive containing XML files, this method extracts word/document.xml, performs plain‑text replacements, and repackages the archive.

No additional third‑party libraries are needed beyond the Python standard library and pandas for Excel handling.

Conversion from .doc to .docx is still performed via win32com.

When recompressing, use zipfile.zlib.DEFLATED to preserve the original folder hierarchy.

Sample code (excerpt):

from shutil import rmtree
import zipfile
from copy import deepcopy
from pathlib import Path
from win32com import client as wc  # pip install pypiwin32
import pandas as pd

def docx_unzip(docx_path):
    docx_path = Path(docx_path)
    unzip_path = docx_path.with_name(docx_path.stem)
    with zipfile.ZipFile(docx_path, 'r') as f:
        for file in f.namelist():
            f.extract(file, path=unzip_path)
    xml_path = unzip_path.joinpath('word/document.xml')
    with xml_path.open(encoding='utf-8') as f:
        xml_file = f.read()
    return unzip_path, xml_path, xml_file

def docx_zipped(docx_path, zipped_path):
    docx_path = Path(docx_path)
    with zipfile.ZipFile(zipped_path, 'w', zipfile.zlib.DEFLATED) as f:
        for file in docx_path.glob('**/*.*'):
            f.write(file, file.as_posix().replace(docx_path.as_posix() + '/', ''))

def replace_docx(name, values, xml_file, xml_path, unzip_path, path_name='Company'):
    xml_file_copy = deepcopy(xml_file)
    for col_name, value in zip(name, values):
        if col_name == 'Company':
            path_name = str(value)
        xml_file_copy = xml_file_copy.replace(col_name, str(value))
    with xml_path.open('w', encoding='utf-8') as f:
        f.write(xml_file_copy)
    docx_zipped(unzip_path, f'{save_folder}/{path_name}.docx')

if __name__ == '__main__':
    doc_path = r"D:\solve_path\单位.doc"
    excel_path = r"D:\solve_path\信息.xls"
    save_folder = Path('D:/docx_save')
    save_folder.mkdir(parents=True, exist_ok=True)
    data = pd.read_excel(excel_path)
    docx_path = doctransform2docx(doc_path)
    unzip_path, xml_path, xml_file = docx_unzip(docx_path)
    data.apply(lambda x: replace_docx(x.index, x.values, xml_file, xml_path, unzip_path), axis=1)
    rmtree(unzip_path)

Both approaches ultimately achieve the goal of batch‑generating personalized Word documents while preserving most of the original formatting; the zipfile method retains underline and box styles that were lost with the first method.

Conclusion: After several attempts, the author found a workable solution that replaces placeholders without breaking the document layout, though further refinements may still be possible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Excel docx word processing zipfile

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.