Fundamentals 7 min read

How to Batch Extract and Filter Excel Data with Python’s openpyxl and glob

Learn how to use Python's openpyxl and glob modules to automatically extract rows with purchase quantities over 50 from single or thousands of Excel files, create new workbooks, and merge results, with complete code examples for both basic and advanced scenarios.

Python Crawling & Data Mining

Sep 5, 2020

How to Batch Extract and Filter Excel Data with Python’s openpyxl and glob

Requirement Description

Basic task: extract rows from a single Excel file where the purchase quantity (column F) exceeds 50 and save them to a new workbook.

Python Implementation (Basic)

from openpyxl import load_workbook, Workbook

path = 'C:/Users/xxxxxx'
workbook = load_workbook(path + '/' + '电商婴儿数据.xlsx')
sheet = workbook.active

buy_mount = sheet['F']
row_lst = []
for cell in buy_mount:
    if isinstance(cell.value, int) and cell.value > 50:
        print(cell.row)
        row_lst.append(cell.row)

new_workbook = Workbook()
new_sheet = new_workbook.active
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)

for row in row_lst:
    data_lst = [cell.value for cell in sheet[row]]
    new_sheet.append(data_lst)

new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')

The script creates a new workbook, copies the header, and writes all qualifying rows.

Advanced Requirement

Process 1000 Excel files in a folder, extract rows where the purchase quantity exceeds 50 from each file, and merge them into a single workbook.

Python Implementation (Advanced)

from openpyxl import load_workbook, Workbook
import glob

path = 'C:/Users/xxxxxx'
new_workbook = Workbook()
new_sheet = new_workbook.active
flag = 0

for file in glob.glob(path + '/*.xlsx'):
    workbook = load_workbook(file)
    sheet = workbook.active

    buy_mount = sheet['F']
    row_lst = []
    for cell in buy_mount:
        if isinstance(cell.value, int) and cell.value > 50:
            row_lst.append(cell.row)

    if not flag:
        header = sheet[1]
        header_lst = [cell.value for cell in header]
        new_sheet.append(header_lst)
        flag = 1

    for row in row_lst:
        data_lst = [cell.value for cell in sheet[row]]
        new_sheet.append(data_lst)

new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')

This approach reuses the same new workbook, adds the header only once, and iterates through each file to collect qualifying rows.

Using openpyxl provides fine‑grained control over Excel files, though pandas could achieve the same with fewer lines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Batch Processing glob openpyxl Excel Automation

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.