Fundamentals 8 min read

Automate Excel Data Extraction with Python openpyxl: From Single to 1000 Files

This tutorial shows how to use Python's openpyxl (and optionally glob) to extract rows with purchase counts over 50 from an Excel sheet, create a new workbook, and scale the solution to batch‑process a thousand similar files automatically.

Python Crawling & Data Mining

Mar 4, 2021

Automate Excel Data Extraction with Python openpyxl: From Single to 1000 Files

Introduction

Hello everyone, welcome to the Python office automation series. In this article we demonstrate a simple case: using openpyxl to extract specific data from an Excel file, generate a new file, and then automate the process for multiple files.

Learning Points

Using the openpyxl module

Using the glob module to create batch processes

Data Source

The example uses the e‑commerce baby dataset from Alibaba Cloud Tianchi, which can be downloaded as an Excel file.

Requirement Description

Basic difficulty: From 电商婴儿数据.xlsx extract rows where the purchase count (column F, field buy_mount) exceeds 50, and write those rows into a new Excel workbook.

Advanced difficulty: In a folder containing 1000 Excel files named 电商婴儿数据1.xlsx … 电商婴儿数据1000.xlsx, extract all rows meeting the same condition from each file and aggregate them into a single new workbook.

Python Implementation

First, handle the basic case for a single file:

from openpyxl import load_workbook, Workbook

# Path to the folder containing the data file
path = 'C:/Users/xxxxxx'

# Open the workbook
workbook = load_workbook(path + '/' + '电商婴儿数据.xlsx')
sheet = workbook.active

# Select column F (purchase count)
buy_mount = sheet['F']
row_lst = []
for cell in buy_mount:
    if isinstance(cell.value, int) and cell.value > 50:
        row_lst.append(cell.row)

# Create a new workbook and copy the header
new_workbook = Workbook()
new_sheet = new_workbook.active
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)

# Copy the qualifying rows
for row in row_lst:
    data_lst = [cell.value for cell in sheet[row]]
    new_sheet.append(data_lst)

# Save the result
new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')

To scale up to 1000 files, use glob to iterate over all .xlsx files and reuse the above logic, creating the new workbook only once and adding the header a single time:

from openpyxl import load_workbook, Workbook
import glob

path = 'C:/Users/xxxxxx'
new_workbook = Workbook()
new_sheet = new_workbook.active
header_added = False

for file in glob.glob(path + '/*.xlsx'):
    workbook = load_workbook(file)
    sheet = workbook.active
    buy_mount = sheet['F']
    row_lst = []
    for cell in buy_mount:
        if isinstance(cell.value, int) and cell.value > 50:
            row_lst.append(cell.row)
    if not header_added:
        header = sheet[1]
        new_sheet.append([cell.value for cell in header])
        header_added = True
    for row in row_lst:
        new_sheet.append([cell.value for cell in sheet[row]])

new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')

Conclusion

The complete process shows how to batch‑extract specific data from Excel files using Python. While pandas could achieve the same with fewer lines, openpyxl offers finer control over Excel features and is used throughout this automation series. Mastering such batch operations frees your hands from repetitive manual work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Batch Processing openpyxl Excel Automation

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.