Fundamentals 5 min read

How to Automate Complex Data Classification in Python with Pandas

This article walks through a real‑world Python automation scenario, showing how to use pandas to filter, group, and merge multi‑row data for classification, complete with step‑by‑step code, sample screenshots, and practical tips for handling large Excel datasets.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Automate Complex Data Classification in Python with Pandas

1. Introduction

Hello, I am PiPi. Recently a member of the Python community asked about automating office tasks with Python. Below is the original data they provided and the target result they want to achieve.

The highlighted two rows need to be processed.

2. Implementation Process

In a previous article we handled a small dataset; this time we tackle multi‑row classification as demonstrated by the expert "隔壁😼山楂". The problem involves multiple systems and dozens of vulnerability details, making it considerably more complex.

Here is the new code provided:

# 筛选或条件
dfc1 = df[df['是否提供误报证明'].eq('是')].groupby(['系统名称', '漏洞名称', '是否提供误报证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供误报证明ip'}).reset_index()
dfc2 = df[df['是否提供无法整改证明'].eq('是')].groupby(['系统名称', '漏洞名称', '是否提供无法整改证明']).agg({'ip': 'unique'}).rename(columns={'ip': '已提供无法整改证明ip'}).reset_index()
res = dfc1.merge(dfc2, how='outer', on=['系统名称', '漏洞名称'])
res1 = res.set_index(['系统名称', '漏洞名称'])
# 筛选与条件
res2 = df[df['是否提供误报证明'].eq('否') & df['是否提供无法整改证明'].eq('否')].groupby(['系统名称', '漏洞名称']).agg({'ip': 'unique'}).rename(columns={'ip': '没有误报和无法整改证明ip'})
# 结果合并
res = res1.join(res2, how='outer').fillna('')
# 将结果列表处理成字符串
ip_cols = res.columns[res.columns.str.contains('ip')]
res[ip_cols] = res[ip_cols].applymap(', '.join)
# 无ip的单元格用无填充
res[ip_cols] = res[ip_cols].where(res[ip_cols].ne(''), '无')
# 保存结果
res.to_excel('result.xlsx')

This code essentially performs three separate filters, merges their results, and outputs the expected classification table.

In summary, the article presents a practical Python automation problem, provides a detailed pandas solution, and shares the complete code to help readers replicate the result.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data classificationData Automation
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.