Fundamentals 7 min read

Extract Highest-Priority Labels per User Using pandas GroupBy & explode

This tutorial demonstrates two pandas approaches to retrieve each user's highest‑priority label based on ordered tags, covering data preparation, grouping, custom sorting with key functions, deduplication, and the explode method, complete with code snippets and performance considerations.

Python Crawling & Data Mining

Feb 8, 2022

Extract Highest-Priority Labels per User Using pandas GroupBy & explode

Rescue pandas plan (4) – DataFrame groupby conditional value lookup

/ Data requirement

/ Requirement breakdown

/ Requirement handling

/ Summary

Hi, I’m the author. Many people avoid pandas, so this series aims to help them fall in love with pandas.

Series name (series number) – specific demand solved in this article

Platform:

Windows 10

Python 3.8

pandas >=1.2.4

/ Data requirement

Based on each user's judgment tags sorted as A, B, C, D…, obtain the highest‑priority data for each user while keeping other tag columns, as shown in the figure.

/ Requirement breakdown

Since we need to extract the highest‑priority tag per user, we can group by user and search within each group; two implementation methods are provided.

/ Requirement handling

Method 1:

Because the example has only one additional tag column, we wrap other tag values in a list.

df['其他标签'] = df['其他标签'].map(lambda x: [x])

Then group by user and tag, aggregating the list column with sum.

df = df.groupby(['用户','判断标签'], as_index=False)['其他标签'].sum()

Sort the tag column using a custom key mapping because older pandas versions lack the key argument in sort_values.

df.sort_values('判断标签', key=lambda x: x.map({'甲':1,'乙':2,'丙':3,'丁':4}), inplace=True)

Remove duplicates to keep the first occurrence per user.

df.drop_duplicates('用户', inplace=True)

Finally, explode the list column to obtain the target format.

df.explode('其他标签')

Method 2:

Directly group by user and apply a function that returns the rows with the top tag after sorting.

def get_first_label(data):
    """Return rows with the highest‑priority tag in the group"""
    return data[data['判断标签'] == data.head(1)['判断标签'].values[0]]

# sort tags then group
df.sort_values('判断标签', key=lambda x: x.map({'甲':1,'乙':2,'丙':3,'丁':4}), inplace=True)
df.groupby(['用户']).apply(get_first_label).reset_index(drop=True)

Result:

/ Summary

Group‑based lookup is a common data‑processing need; method one is generally faster because it relies on built‑in pandas operations, while method two is shorter but may be slower on large datasets. Choose the approach based on data size and personal preference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python dataframe Pandas sorting groupby data-processing explode

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.