Extract Highest-Priority Labels per User Using pandas GroupBy & explode
This tutorial demonstrates two pandas approaches to retrieve each user's highest‑priority label based on ordered tags, covering data preparation, grouping, custom sorting with key functions, deduplication, and the explode method, complete with code snippets and performance considerations.
Rescue pandas plan (4) – DataFrame groupby conditional value lookup
/ Data requirement
/ Requirement breakdown
/ Requirement handling
/ Summary
Hi, I’m the author. Many people avoid pandas, so this series aims to help them fall in love with pandas.
Series name (series number) – specific demand solved in this article
Platform:
Windows 10
Python 3.8
pandas >=1.2.4
/ Data requirement
Based on each user's judgment tags sorted as A, B, C, D…, obtain the highest‑priority data for each user while keeping other tag columns, as shown in the figure.
/ Requirement breakdown
Since we need to extract the highest‑priority tag per user, we can group by user and search within each group; two implementation methods are provided.
/ Requirement handling
Method 1:
Because the example has only one additional tag column, we wrap other tag values in a list.
df['其他标签'] = df['其他标签'].map(lambda x: [x])Then group by user and tag, aggregating the list column with sum.
df = df.groupby(['用户','判断标签'], as_index=False)['其他标签'].sum()Sort the tag column using a custom key mapping because older pandas versions lack the key argument in sort_values.
df.sort_values('判断标签', key=lambda x: x.map({'甲':1,'乙':2,'丙':3,'丁':4}), inplace=True)Remove duplicates to keep the first occurrence per user.
df.drop_duplicates('用户', inplace=True)Finally, explode the list column to obtain the target format.
df.explode('其他标签')Method 2:
Directly group by user and apply a function that returns the rows with the top tag after sorting.
def get_first_label(data):
"""Return rows with the highest‑priority tag in the group"""
return data[data['判断标签'] == data.head(1)['判断标签'].values[0]]
# sort tags then group
df.sort_values('判断标签', key=lambda x: x.map({'甲':1,'乙':2,'丙':3,'丁':4}), inplace=True)
df.groupby(['用户']).apply(get_first_label).reset_index(drop=True)Result:
/ Summary
Group‑based lookup is a common data‑processing need; method one is generally faster because it relies on built‑in pandas operations, while method two is shorter but may be slower on large datasets. Choose the approach based on data size and personal preference.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
