Fundamentals 4 min read

Using Pandas groupby for Data Aggregation and Grouping

This tutorial teaches how to use the Pandas library's groupby method to group data by one or multiple columns and apply aggregation functions such as sum, mean, max, and min, with clear code examples and a practical exercise.

Test Development Learning Exchange

Nov 18, 2024

Using Pandas groupby for Data Aggregation and Grouping

Objective: Learn to use Pandas for data aggregation and grouping.

Learning content: the groupby method and aggregation functions such as sum, mean, max, and min.

Code examples:

import pandas as pd
# create example dataset
data = {
    '姓名': ['张三', '李四', '王五', '张三', '赵六', '李四'],
    '部门': ['销售部', '市场部', '技术部', '销售部', '财务部', '市场部'],
    '销售额': [120, 150, 130, 160, 140, 170],
    '成本': [80, 90, 100, 110, 120, 130]
}
df = pd.DataFrame(data)
print(f"示例数据集: 
{df}")
# group by '部门'
grouped_by_department = df.groupby('部门')
print(f"按 '部门' 列分组的结果: 
{grouped_by_department}")
# iterate groups
for department, group in grouped_by_department:
    print(f"部门: {department}")
    print(f"数据: 
{group}
")
# aggregation examples
mean_sales_by_department = grouped_by_department['销售额'].mean()
print(f"按 '部门' 列分组后，每组的销售额均值: 
{mean_sales_by_department}")
sum_sales_by_department = grouped_by_department['销售额'].sum()
print(f"按 '部门' 列分组后，每组的销售额总和: 
{sum_sales_by_department}")
max_sales_by_department = grouped_by_department['销售额'].max()
print(f"按 '部门' 列分组后，每组的销售额最大值: 
{max_sales_by_department}")
min_sales_by_department = grouped_by_department['销售额'].min()
print(f"按 '部门' 列分组后，每组的销售额最小值: 
{min_sales_by_department}")
# multi-column aggregation
mean_by_department = grouped_by_department[['销售额','成本']].mean()
print(f"按 '部门' 列分组后，每组的销售额和成本均值: 
{mean_by_department}")
sum_by_department = grouped_by_department[['销售额','成本']].sum()
print(f"按 '部门' 列分组后，每组的销售额和成本总和: 
{sum_by_department}")
# multi-level grouping
grouped_by_department_name = df.groupby(['部门','姓名'])
mean_sales_by_department_name = grouped_by_department_name['销售额'].mean()
print(f"按 '部门' 和 '姓名' 列分组后，每组的销售额均值: 
{mean_sales_by_department_name}")

Practice: Apply grouping on a dataset by a single column and compute the mean for each group.

Summary: After this exercise you should be able to perform data aggregation and grouping with Pandas using groupby and various aggregation functions, preparing you for further Python data processing topics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis data aggregation groupby

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.