Using Pandas groupby for Data Aggregation and Grouping
This tutorial teaches how to use the Pandas library's groupby method to group data by one or multiple columns and apply aggregation functions such as sum, mean, max, and min, with clear code examples and a practical exercise.
Objective: Learn to use Pandas for data aggregation and grouping.
Learning content: the groupby method and aggregation functions such as sum , mean , max , and min .
Code examples:
import pandas as pd
# create example dataset
data = {
'姓名': ['张三', '李四', '王五', '张三', '赵六', '李四'],
'部门': ['销售部', '市场部', '技术部', '销售部', '财务部', '市场部'],
'销售额': [120, 150, 130, 160, 140, 170],
'成本': [80, 90, 100, 110, 120, 130]
}
df = pd.DataFrame(data)
print(f"示例数据集: \n{df}")
# group by '部门'
grouped_by_department = df.groupby('部门')
print(f"按 '部门' 列分组的结果: \n{grouped_by_department}")
# iterate groups
for department, group in grouped_by_department:
print(f"部门: {department}")
print(f"数据: \n{group}\n")
# aggregation examples
mean_sales_by_department = grouped_by_department['销售额'].mean()
print(f"按 '部门' 列分组后,每组的销售额均值: \n{mean_sales_by_department}")
sum_sales_by_department = grouped_by_department['销售额'].sum()
print(f"按 '部门' 列分组后,每组的销售额总和: \n{sum_sales_by_department}")
max_sales_by_department = grouped_by_department['销售额'].max()
print(f"按 '部门' 列分组后,每组的销售额最大值: \n{max_sales_by_department}")
min_sales_by_department = grouped_by_department['销售额'].min()
print(f"按 '部门' 列分组后,每组的销售额最小值: \n{min_sales_by_department}")
# multi-column aggregation
mean_by_department = grouped_by_department[['销售额','成本']].mean()
print(f"按 '部门' 列分组后,每组的销售额和成本均值: \n{mean_by_department}")
sum_by_department = grouped_by_department[['销售额','成本']].sum()
print(f"按 '部门' 列分组后,每组的销售额和成本总和: \n{sum_by_department}")
# multi-level grouping
grouped_by_department_name = df.groupby(['部门','姓名'])
mean_sales_by_department_name = grouped_by_department_name['销售额'].mean()
print(f"按 '部门' 和 '姓名' 列分组后,每组的销售额均值: \n{mean_sales_by_department_name}")Practice: Apply grouping on a dataset by a single column and compute the mean for each group.
Summary: After this exercise you should be able to perform data aggregation and grouping with Pandas using groupby and various aggregation functions, preparing you for further Python data processing topics.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.