Build a Reusable Python Batch Tool for Excel Automation
Learn how to create a modular Python script that batch processes Excel files—merging, cleaning, renaming, and more—using pandas and openpyxl, with a clear project structure and extensible design for real-world office automation.
In response to growing office automation needs, this guide walks you through building a generic Python batch-processing script that can dramatically reduce repetitive work such as merging Excel files, bulk replacements, data splitting, watermarking, and report generation.
Project Goal
The script should:
Read all Excel files (xls/xlsx) in a specified folder.
Apply customizable processing logic to each file.
Write the processed results to a new output folder.
Maintain a clean, extensible, and reusable code structure.
Key Libraries
We rely on pandas for data manipulation and openpyxl for advanced Excel handling.
import os
import pandas as pd
from openpyxl import load_workbookProject Structure
excel_batch_tool/
│── main.py # entry point
│── processors.py # processing functions
│── input/ # source Excel files
│── output/ # processed resultsThis modular layout makes future extensions straightforward.
Core Functionality: Traversing Excel Files
import os
import pandas as pd
from processors import process_excel
INPUT_DIR = "input"
OUTPUT_DIR = "output"
def ensure_output_folder():
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
def main():
ensure_output_folder()
for filename in os.listdir(INPUT_DIR):
if filename.endswith((".xlsx", ".xls")):
path = os.path.join(INPUT_DIR, filename)
print(f"Processing: {filename}")
df = pd.read_excel(path)
df = process_excel(df)
output_path = os.path.join(OUTPUT_DIR, filename)
df.to_excel(output_path, index=False)
print(f"Saved to: {output_path}")
if __name__ == "__main__":
main()The template can be reused by simply updating processors.py with new logic.
Custom Processing Logic (processors.py)
def process_excel(df):
# 1. Drop fully empty rows
df = df.dropna(how="all")
# 2. Trim column names
df.columns = [str(c).strip() for c in df.columns]
# 3. Add a timestamp column
import datetime
df["处理时间"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
return dfYou can extend this with batch replacements, formula insertion, multi‑sheet merging, conditional exports, statistical analysis, etc.
Example: Batch Replace Values in a Column
def process_excel(df):
if "产品名称" in df.columns:
df["产品名称"] = df["产品名称"].replace("旧名", "新名")
return dfExample: Merge All Excel Files into One
all_data = []
for filename in os.listdir(INPUT_DIR):
if filename.endswith((".xls", ".xlsx")):
df = pd.read_excel(os.path.join(INPUT_DIR, filename))
df["来源文件"] = filename
all_data.append(df)
final_df = pd.concat(all_data, ignore_index=True)
final_df.to_excel("output/合并结果.xlsx", index=False)Example: Split Files by a Category Column
def process_excel(df):
for name, group in df.groupby("分类"):
group.to_excel(f"output/{name}.xlsx", index=False)
return NoneAdjust main.py to write output only when a DataFrame is returned.
Conclusion
By completing this project you will master:
Flexible use of pandas and openpyxl .
Organizing batch‑processing scripts for scalability.
Practical function and module design.
Real‑world office automation implementation.
How to evolve a Python utility into a marketable product.
The same framework can be adapted into various tools such as merge utilities, splitters, replacers, watermark adders, and format converters, all ready for commercial use.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
