How to Batch Extract and Filter Excel Data with Python’s openpyxl and glob
Learn how to use Python's openpyxl and glob modules to automatically extract rows with purchase quantities over 50 from single or thousands of Excel files, create new workbooks, and merge results, with complete code examples for both basic and advanced scenarios.
Requirement Description
Basic task: extract rows from a single Excel file where the purchase quantity (column F) exceeds 50 and save them to a new workbook.
Python Implementation (Basic)
from openpyxl import load_workbook, Workbook
path = 'C:/Users/xxxxxx'
workbook = load_workbook(path + '/' + '电商婴儿数据.xlsx')
sheet = workbook.active
buy_mount = sheet['F']
row_lst = []
for cell in buy_mount:
if isinstance(cell.value, int) and cell.value > 50:
print(cell.row)
row_lst.append(cell.row)
new_workbook = Workbook()
new_sheet = new_workbook.active
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)
for row in row_lst:
data_lst = [cell.value for cell in sheet[row]]
new_sheet.append(data_lst)
new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')The script creates a new workbook, copies the header, and writes all qualifying rows.
Advanced Requirement
Process 1000 Excel files in a folder, extract rows where the purchase quantity exceeds 50 from each file, and merge them into a single workbook.
Python Implementation (Advanced)
from openpyxl import load_workbook, Workbook
import glob
path = 'C:/Users/xxxxxx'
new_workbook = Workbook()
new_sheet = new_workbook.active
flag = 0
for file in glob.glob(path + '/*.xlsx'):
workbook = load_workbook(file)
sheet = workbook.active
buy_mount = sheet['F']
row_lst = []
for cell in buy_mount:
if isinstance(cell.value, int) and cell.value > 50:
row_lst.append(cell.row)
if not flag:
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)
flag = 1
for row in row_lst:
data_lst = [cell.value for cell in sheet[row]]
new_sheet.append(data_lst)
new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')This approach reuses the same new workbook, adds the header only once, and iterates through each file to collect qualifying rows.
Using openpyxl provides fine‑grained control over Excel files, though pandas could achieve the same with fewer lines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
