Fundamentals 6 min read

Automate Bulk Excel Filtering with Python: Step-by-Step Code Guide

This tutorial shows how to use Python and pandas to automatically scan thousands of Excel files, filter rows by a specific ID, and consolidate the results into a single workbook, providing complete code examples and explanations to replace manual copy‑paste operations.

Python Crawling & Data Mining

Jan 2, 2023

Automate Bulk Excel Filtering with Python: Step-by-Step Code Guide

Introduction

The author, a Python enthusiast, shares a practical case: batch‑filtering rows in thousands of Excel files and saving the results as new files, eliminating tedious manual work.

Requirement Clarification

A fan needs to locate a specific row in many Excel files without opening each file and using CTRL+F. Manually processing hundreds or thousands of files would take excessive time.

Python can automate this task in under a minute.

Initial Code

import pandas as pd
import os

path = "./新建文件夹/"
# Get all file names in the folder
name_list = os.listdir(path)
name_list = pd.DataFrame(name_list)

# Loop through files
for i in range(len(name_list)):
    df = pd.read_excel(path + name_list[0][i])
    print('{}读取完成!'.format(i))
    hai = df[df['id'] == '58666']
    hai.to_excel('./res/' + name_list[0][i])

Solution 1

Traverse the folder, filter rows, and concatenate results using concat:

import pandas as pd
import os

path = "./新建文件夹/"
name_list = os.listdir(path)
name_list = pd.DataFrame(name_list)

res = []
for i in range(len(name_list)):
    df = pd.read_excel(path + name_list[0][i])
    print('文件{}读取完成!'.format(i))
    target_data = df[df['id'] == '58666']
    res.append(target_data)

final_df = pd.concat(res)
final_df.to_excel("target.xlsx")

Solution 2

Another variant reads the first file, then iterates over the remaining files, concatenating with ignore_index=True:

import pandas as pd
import os

path = "./新建文件夹/"
name_list = os.listdir(path)

res = pd.read_excel(path+name_list[0])
res = res[res['id'] == '58666']

for file in name_list[1:]:
    temp = pd.read_excel(path+file)
    temp = temp[temp['id'] == '58666']
    res = pd.concat([res, temp], ignore_index=True)

res.to_excel('res.xlsx')

Conclusion

The article demonstrates a practical Python automation case for Excel file processing, allowing readers to adapt the code to their own workflows and achieve significant time savings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

pandas

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.