How to Group and Aggregate Data in Pandas: 5 Practical Methods
This article walks through a Python data‑processing problem and presents five distinct pandas‑based solutions, each with complete code snippets and output screenshots, helping readers efficiently group and aggregate tabular data.
1. Introduction
In a Python community a user asked how to process data shown in the screenshot below. The raw data consists of a list of numeric categories and a list of corresponding IDs.
2. Implementation
Method 1
Using pandas to read an Excel file, group by the numeric column and convert the groups to a dictionary.
import pandas as pd
df = pd.read_excel('1.xlsx', names=['num', 'date'])
df = df.groupby("num").agg(list)
res = df.to_dict()["date"]
print(res)Method 2
Iterating over the two lists and building a dictionary manually.
num=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
data=['201825301001', '201825301002', '201825301004', '201825301005', '201825301006', '201825301007', '201825301008', '201825301009', '201825301010', '201825301011', '201825301012', '201825301013', '201825301014', '201825301015', '201825301016', '201825301017', '201825301018', '201825301019', '201825305001', '201825305002']
result={}
for k,v in zip(num,data):
if k in result.keys():
result.get(k).append(v)
else:
result[k]=[v]
print(result)Method 3
Using itertools.groupby to group the pairs.
from itertools import groupby
num=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
data=['201825301001', '201825301002', '201825301004', '201825301005', '201825301006', '201825301007', '201825301008', '201825301009', '201825301010', '201825301011', '201825301012', '201825301013', '201825301014', '201825301015', '201825301016', '201825301017', '201825301018', '201825301019', '201825305001', '201825305002']
result = {k: [i[1] for i in v] for k, v in groupby(zip(num, data), key=lambda x: int(x[0]))}
print(result)Method 4
Building the dictionary with a single comprehension.
from itertools import groupby
# same num and data as before
result={}
for k,v in zip(num,data):
result[k]=result.get(k,[])+[v]
result={int(k):result.get(k,[])+[v] for k,v in zip(num,data)}
print(result)Method 5
Creating a pandas DataFrame and using groupby with list aggregation.
df = pd.DataFrame({'num': num, 'data': data})
df = df.groupby("num").agg(list)
res = df.to_dict()["data"]
print(res)3. Conclusion
The article demonstrates five different ways to group and aggregate data in pandas, providing clear code snippets and resulting outputs, helping readers solve similar data‑processing problems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
