How to Automate Complex Data Classification in Python with Pandas
This article walks through a real‑world Python automation scenario, showing how to use pandas to filter, group, and merge multi‑row data for classification, complete with step‑by‑step code, sample screenshots, and practical tips for handling large Excel datasets.
1. Introduction
Hello, I am PiPi. Recently a member of the Python community asked about automating office tasks with Python. Below is the original data they provided and the target result they want to achieve.
The highlighted two rows need to be processed.
2. Implementation Process
In a previous article we handled a small dataset; this time we tackle multi‑row classification as demonstrated by the expert "隔壁😼山楂". The problem involves multiple systems and dozens of vulnerability details, making it considerably more complex.
Here is the new code provided:
# 筛选或条件
dfc1 = df[df['是否提供误报证明'].eq('是')].groupby(['系统名称', '漏洞名称', '是否提供误报证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供误报证明ip'}).reset_index()
dfc2 = df[df['是否提供无法整改证明'].eq('是')].groupby(['系统名称', '漏洞名称', '是否提供无法整改证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供无法整改证明ip'}).reset_index()
res = dfc1.merge(dfc2, how='outer', on=['系统名称', '漏洞名称'])
res1 = res.set_index(['系统名称', '漏洞名称'])
# 筛选与条件
res2 = df[df['是否提供误报证明'].eq('否') & df['是否提供无法整改证明'].eq('否')].groupby(['系统名称', '漏洞名称']).agg({'ip': 'unique'}).rename(columns={'ip': '没有误报和无法整改证明ip'})
# 结果合并
res = res1.join(res2, how='outer').fillna('')
# 将结果列表处理成字符串
ip_cols = res.columns[res.columns.str.contains('ip')]
res[ip_cols] = res[ip_cols].applymap(', '.join)
# 无ip的单元格用无填充
res[ip_cols] = res[ip_cols].where(res[ip_cols].ne(''), '无')
# 保存结果
res.to_excel('result.xlsx')This code essentially performs three separate filters, merges their results, and outputs the expected classification table.
In summary, the article presents a practical Python automation problem, provides a detailed pandas solution, and shares the complete code to help readers replicate the result.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
