Fundamentals 4 min read

Using Pandas groupby for Data Aggregation and Grouping

This tutorial teaches how to use the Pandas library's groupby method to group data by one or multiple columns and apply aggregation functions such as sum, mean, max, and min, with clear code examples and a practical exercise.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Using Pandas groupby for Data Aggregation and Grouping

Objective: Learn to use Pandas for data aggregation and grouping.

Learning content: the groupby method and aggregation functions such as sum , mean , max , and min .

Code examples:

import pandas as pd
# create example dataset
data = {
    '姓名': ['张三', '李四', '王五', '张三', '赵六', '李四'],
    '部门': ['销售部', '市场部', '技术部', '销售部', '财务部', '市场部'],
    '销售额': [120, 150, 130, 160, 140, 170],
    '成本': [80, 90, 100, 110, 120, 130]
}
df = pd.DataFrame(data)
print(f"示例数据集: \n{df}")
# group by '部门'
grouped_by_department = df.groupby('部门')
print(f"按 '部门' 列分组的结果: \n{grouped_by_department}")
# iterate groups
for department, group in grouped_by_department:
    print(f"部门: {department}")
    print(f"数据: \n{group}\n")
# aggregation examples
mean_sales_by_department = grouped_by_department['销售额'].mean()
print(f"按 '部门' 列分组后,每组的销售额均值: \n{mean_sales_by_department}")
sum_sales_by_department = grouped_by_department['销售额'].sum()
print(f"按 '部门' 列分组后,每组的销售额总和: \n{sum_sales_by_department}")
max_sales_by_department = grouped_by_department['销售额'].max()
print(f"按 '部门' 列分组后,每组的销售额最大值: \n{max_sales_by_department}")
min_sales_by_department = grouped_by_department['销售额'].min()
print(f"按 '部门' 列分组后,每组的销售额最小值: \n{min_sales_by_department}")
# multi-column aggregation
mean_by_department = grouped_by_department[['销售额','成本']].mean()
print(f"按 '部门' 列分组后,每组的销售额和成本均值: \n{mean_by_department}")
sum_by_department = grouped_by_department[['销售额','成本']].sum()
print(f"按 '部门' 列分组后,每组的销售额和成本总和: \n{sum_by_department}")
# multi-level grouping
grouped_by_department_name = df.groupby(['部门','姓名'])
mean_sales_by_department_name = grouped_by_department_name['销售额'].mean()
print(f"按 '部门' 和 '姓名' 列分组后,每组的销售额均值: \n{mean_sales_by_department_name}")

Practice: Apply grouping on a dataset by a single column and compute the mean for each group.

Summary: After this exercise you should be able to perform data aggregation and grouping with Pandas using groupby and various aggregation functions, preparing you for further Python data processing topics.

data analysisdata aggregationgroupby
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.