Fundamentals 5 min read

10 Practical Ways to Iterate and Transform a Pandas DataFrame in Python

This article demonstrates ten practical techniques for iterating over rows, columns, and values of a pandas DataFrame and applying common transformations such as apply, vectorized operations, map, mask, groupby, cumulative sum, and rolling calculations, each illustrated with concise Python code examples.

Test Development Learning Exchange

Aug 30, 2024

10 Practical Ways to Iterate and Transform a Pandas DataFrame in Python

1. Iterate over DataFrame rows

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的每一行
for index, row in df.iterrows():
    print(f"Index: {index}, Row: {row}")

2. Iterate over DataFrame columns

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的每一列
for col_name in df.columns:
    print(f"Column Name: {col_name}")

3. Iterate over all values in a DataFrame

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的所有值
for value in df.values.flatten():
    print(f"Value: {value}")

4. Use the apply() function

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 定义一个函数用于乘以2
def multiply_by_two(row):
    return row * 2
# 应用函数到 DataFrame 上
result = df.apply(multiply_by_two)
print(result)

5. Perform vectorized operations

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 直接对 DataFrame 的每一项乘以2
result = df * 2
print(result)

6. Update values with map()

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 更新 DataFrame 中的值
df['A'] = df['A'].map(lambda x: x + 1)
print(df)

7. Conditionally update values with mask()

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 条件性地更新 DataFrame 中的值
df['A'] = df['A'].mask(df['A'] > 2, 0)
print(df)

8. Group data using groupby()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 使用 groupby 对 DataFrame 分组并计算每组的总和
grouped = df.groupby('Key')['Value'].sum()
print(grouped)

9. Compute cumulative sum with cumsum()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 计算 Value 列的累计求和
df['CumulativeSum'] = df['Value'].cumsum()
print(df)

10. Calculate rolling statistics with rolling()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 计算 Value 列的滚动平均值（窗口大小为2）
df['RollingMean'] = df['Value'].rolling(window=2).mean()
print(df)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data manipulation

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.