How to Automate Excel Data Processing with Pandas: A Step-by-Step Guide
This article demonstrates how to use Python's pandas library to automate Excel data processing, showing code that groups, filters, merges, and formats vulnerability data, and provides downloadable results, while also offering tips for handling larger datasets and encouraging community interaction.
Introduction
Hello, I am PiPi. Recently a member asked about automating office tasks with Python. Below is the original data and the desired target data.
The goal is to operate on the two highlighted rows.
Implementation Process
Previously, a solution using openpyxl worked for small batches, but handling dozens of vulnerabilities becomes cumbersome. This article presents a more efficient pandas implementation.
Here is the provided code:
# 筛选或条件
dfc1 = df.groupby(['系统名称', '漏洞名称', '是否提供误报证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供误报证明ip'}).reset_index()
dfc2 = df.groupby(['系统名称', '漏洞名称', '是否提供无法整改证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供无法整改证明ip'}).reset_index()
res = dfc1.merge(dfc2, how='outer', left_on=['系统名称', '漏洞名称', '是否提供误报证明'], right_on=['系统名称', '漏洞名称', '是否提供无法整改证明'])
res1 = res.loc[res['是否提供误报证明'].eq('是') & res['是否提供无法整改证明'].eq('是')].copy()
res1.set_index(['系统名称', '漏洞名称'], inplace=True)
# 筛选与条件
res2 = df[df['是否提供误报证明'].eq('否') & df['是否提供无法整改证明'].eq('否')].groupby(['系统名称', '漏洞名称']).agg({'ip': 'unique'}).rename(columns={'ip': '没有误报和无法整改证明的ip'})
# 结果合并
res = res1.join(res2, how='outer').fillna('')
# 将结果列表处理成字符串
ip_cols = res.columns[res.columns.str.contains('ip')]
res[ip_cols] = res[ip_cols].applymap(', '.join)
# 无ip的单元格用无填充
res[ip_cols] = res[ip_cols].where(res[ip_cols].ne(''), '无')
# 保存结果
res.to_excel('result.xlsx', index=False)The script performs three filtering steps, merges the results, and outputs an Excel file with the expected outcome.
This resolves the user's issue for the two displayed vulnerabilities. For handling multiple vulnerabilities, a follow‑up article will be provided.
Conclusion
The article outlines a practical Python automation problem, presents a pandas‑based solution with full code, and demonstrates how to process and export the data efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
