Fundamentals 5 min read

Automate Bulk Excel Column Extraction and Merging with Python & Pandas

Learn how to use Python's pandas library to automatically scan multiple folders, extract specified columns from hundreds of Excel files, and merge them into a single workbook, complete with step-by-step code examples and visual results.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Automate Bulk Excel Column Extraction and Merging with Python & Pandas

Introduction

Hello, I often need to extract specific columns from Excel files across many folders and merge them into a new Excel file. Handling a few folders is easy, but processing hundreds or thousands becomes overwhelming; this article shows how to automate the task with Python.

Import Libraries

Only the os library for file operations and pandas for data processing are required.

import pandas as pd
import os

Write Code

1. Define the root folder, columns, and output path

# Path to the root folder containing Excel files
path = "D:/a/"
# Columns to extract
key = ['A', 'B']
# List of subfolders
subfolders = os.listdir(path)
# Output merged file name
output_file = path + 'result.xlsx'
writer = pd.ExcelWriter(output_file, engine='openpyxl')

2. Get a list of all Excel files to process

file_names = []
for sub in subfolders:
    if '.xl' in sub:
        continue
    sub_path = path + sub + "/"
    # Get all .xlsx files in the subfolder
    xlsx_names = [f for f in os.listdir(sub_path) if f.endswith('.xlsx')]
    for f in xlsx_names:
        file_names.append(sub_path + f)

3. Loop through each Excel, extract the specified columns, and merge

df = None
for xlsx_name in file_names:
    df1 = pd.read_excel(xlsx_name, sheet_name=0, index_col=None, header=0)
    _df = df1.loc[:, key]
    if df is None:
        df = _df
    else:
        df = pd.concat([df, _df], ignore_index=True)
    print(xlsx_name + "  保存成功!共%d个,第%d个。" % (len(file_names), num))

Execution Result

All folders to be processed are shown below:

Folder structure
Folder structure

The code runs successfully:

Execution success
Execution success

The merged result file is saved:

Saved result file
Saved result file

Content of the extracted result file:

Result file content
Result file content

Conclusion

This article demonstrated how to use pandas to batch‑extract and merge columns from multiple Excel files, showcasing the powerful data‑processing capabilities of Python. Happy coding!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonExcelpandas
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.