Fundamentals 6 min read

Extract Highest‑Priority Tags per User with Pandas: Two Easy GroupBy Techniques

This article demonstrates how to use pandas groupby, list aggregation, sorting, deduplication, and explode (or a custom apply function) to retrieve each user's top‑priority tag while preserving other tag columns, comparing two practical implementations and their performance trade‑offs.

Python Crawling & Data Mining

Apr 13, 2022

Extract Highest‑Priority Tags per User with Pandas: Two Easy GroupBy Techniques

Rescue pandas plan (4) – DataFrame Grouping Condition Lookup

Data Requirement

Based on each user's judgment tags, ordered as A, B, C, D… , we need to obtain the highest‑priority data for each user while keeping the other tag columns.

Requirement Decomposition

Since we need to extract the highest‑priority tag per user, we can group by user and perform the lookup inside each group. Two implementation methods are provided.

Requirement Processing

Method 1

Wrap the "Other Tags" column into a list, then aggregate within each user‑tag group using sum to concatenate the lists.

df['其他标签'] = df['其他标签'].map(lambda x: [x])

Group and sum the lists:

df = df.groupby(['用户', '判断标签'], as_index=False)['其他标签'].sum()

Sort the judgment tags using the key parameter (available in newer pandas versions) to apply a custom order mapping.

df.sort_values('判断标签', key=lambda x: x.map({'甲':1, '乙':2, '丙':3, '丁':4}), inplace=True)

Remove duplicate rows, keeping the first occurrence (the highest‑priority tag).

df.drop_duplicates('用户', inplace=True)

Finally, explode the list column to obtain one row per tag.

df.explode('其他标签')

Method 2

Define a helper function that returns the rows whose judgment tag matches the first (i.e., highest‑priority) tag in the group, then apply it after sorting.

def get_first_label(data):
    """Return rows with the top‑sorted judgment tag in each group"""
    return data[data['判断标签'] == data.head(1)['判断标签'].values[0]]

# sort tags first
df.sort_values('判断标签', key=lambda x: x.map({'甲':1, '乙':2, '丙':3, '丁':4}), inplace=True)
# apply per user
result = df.groupby(['用户']).apply(get_first_label).reset_index(drop=True)

Summary

Grouped lookup is a common data‑processing need. Method 1 uses built‑in pandas aggregation and is generally faster, while Method 2 is more concise but may be slower on large datasets because groupby.apply invokes a Python function for each group. drop_duplicates is a handy tool for keeping the first occurrence.

The sun will rise tomorrow, and we will continue to shine.

Written on 2022‑01‑14

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python dataframe Pandas groupby data-processing

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.