How to Batch‑Generate Word Files from Excel Using Python’s zipfile and python‑docx

This article demonstrates two Python‑based solutions for automatically replacing placeholders in a Word document with data from an Excel sheet—one using python‑docx (with win32com conversion for .doc files) and another leveraging the zipfile module to edit the underlying XML—complete with code, troubleshooting tips, and handling of formatting issues.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Batch‑Generate Word Files from Excel Using Python’s zipfile and python‑docx

Today we share two Python approaches to solve the problem of batch‑generating Word documents from Excel data using the zipfile module.

Approach 1: Use python‑docx.Document to read and modify a Word document.

Approach 2: Use zipfile to treat a .docx file as a zip archive and edit document.xml directly.

Platform: Windows 10, Interpreter: Python 3.7

Task Requirement

An Excel file contains target data; each row must replace corresponding placeholders in a specified Word template and be saved individually.

Solution 1 – python‑docx

Because python‑docx cannot handle .doc files, the .doc is first converted to .docx using win32com. Common issues include installing the correct package (pip install python‑docx) and handling text runs to preserve formatting. The code iterates over paragraphs and runs, replacing placeholders like #ColumnName# with Excel values, then saves each document.

from copy import deepcopy
from pathlib import Path
import win32com.client as wc  # pip install pypiwin32
from docx import Document  # pip install python-docx
import pandas as pd

def doctransform2docx(doc_path):
    docx_path = doc_path + 'x'
    suffix = doc_path.split('.')[1]
    assert 'doc' in suffix, 'Not a Word document!'
    if suffix == 'docx':
        return Document(doc_path)
    word = wc.Dispatch('Word.Application')
    doc = word.Documents.Open(doc_path)
    doc.SaveAs2(docx_path, 16)  # 16 = docx
    doc.Close()
    word.Quit()
    return Document(docx_path)

def replace_docx(name, values, wordfile, path_name='Company'):
    wordfile_copy = deepcopy(wordfile)
    for col_name, value in zip(name, values):
        if col_name == 'Company':
            path_name = str(value)
        for paragraph in wordfile_copy.paragraphs:
            for run in paragraph.runs:
                run.text = run.text.replace(col_name, str(value))
    wordfile_copy.save(f'{save_folder}/{path_name}.docx')

Solution 2 – zipfile

A .docx file is essentially a zip archive containing XML files. By extracting word/document.xml, placeholders can be replaced directly in the XML string, then the folder is re‑zipped into a new .docx. Important points: use zipfile.zlib.DEFLATED for correct compression, and preserve relative file names when writing.

import shutil, zipfile
from copy import deepcopy
from pathlib import Path
import win32com.client as wc  # pip install pypiwin32
import pandas as pd

def docx_unzip(docx_path):
    docx_path = Path(docx_path)
    unzip_path = docx_path.with_name(docx_path.stem)
    with zipfile.ZipFile(docx_path, 'r') as f:
        for file in f.namelist():
            f.extract(file, path=unzip_path)
    xml_path = unzip_path.joinpath('word/document.xml')
    with xml_path.open(encoding='utf-8') as f:
        xml_file = f.read()
    return unzip_path, xml_path, xml_file

def docx_zipped(docx_path, zipped_path):
    docx_path = Path(docx_path)
    with zipfile.ZipFile(zipped_path, 'w', zipfile.zlib.DEFLATED) as f:
        for file in docx_path.glob('**/*.*'):
            f.write(file, file.as_posix().replace(docx_path.as_posix() + '/', ''))

def replace_docx(name, values, xml_file, xml_path, unzip_path, path_name='Company'):
    xml_file_copy = deepcopy(xml_file)
    for col_name, value in zip(name, values):
        if col_name == 'Company':
            path_name = str(value)
        xml_file_copy = xml_file_copy.replace(col_name, str(value))
    with xml_path.open('w', encoding='utf-8') as f:
        f.write(xml_file_copy)
    docx_zipped(unzip_path, f'{save_folder}/{path_name}.docx')

Both approaches finally generate Word files with the data filled in, though preserving complex formatting (e.g., checkboxes, underlines) remains challenging.

Conclusion

After several attempts, a workable method was found that replaces strings in a docx without destroying most formatting; however, further improvements are possible, especially for special characters and layout preservation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationWindowsExcelWorddocxzipfile
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.