Fundamentals 5 min read

10 Practical Ways to Iterate and Transform a Pandas DataFrame in Python

This article demonstrates ten practical techniques for iterating over rows, columns, and values of a pandas DataFrame and applying common transformations such as apply, vectorized operations, map, mask, groupby, cumulative sum, and rolling calculations, each illustrated with concise Python code examples.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
10 Practical Ways to Iterate and Transform a Pandas DataFrame in Python

1. Iterate over DataFrame rows

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的每一行
for index, row in df.iterrows():
    print(f"Index: {index}, Row: {row}")

2. Iterate over DataFrame columns

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的每一列
for col_name in df.columns:
    print(f"Column Name: {col_name}")

3. Iterate over all values in a DataFrame

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 遍历 DataFrame 的所有值
for value in df.values.flatten():
    print(f"Value: {value}")

4. Use the apply() function

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 定义一个函数用于乘以2
def multiply_by_two(row):
    return row * 2
# 应用函数到 DataFrame 上
result = df.apply(multiply_by_two)
print(result)

5. Perform vectorized operations

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 直接对 DataFrame 的每一项乘以2
result = df * 2
print(result)

6. Update values with map()

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 更新 DataFrame 中的值
df['A'] = df['A'].map(lambda x: x + 1)
print(df)

7. Conditionally update values with mask()

import pandas as pd
# 创建示例 DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# 条件性地更新 DataFrame 中的值
df['A'] = df['A'].mask(df['A'] > 2, 0)
print(df)

8. Group data using groupby()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 使用 groupby 对 DataFrame 分组并计算每组的总和
grouped = df.groupby('Key')['Value'].sum()
print(grouped)

9. Compute cumulative sum with cumsum()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 计算 Value 列的累计求和
df['CumulativeSum'] = df['Value'].cumsum()
print(df)

10. Calculate rolling statistics with rolling()

import pandas as pd
# 创建示例 DataFrame
data = {'Key': ['A', 'A', 'B', 'B', 'C'], 'Value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# 计算 Value 列的滚动平均值(窗口大小为2)
df['RollingMean'] = df['Value'].rolling(window=2).mean()
print(df)
dataframeiterationpandasdata-manipulation
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.