Fundamentals 7 min read

Using pandas iterrows() and groupby() for DataFrame Row Iteration and Aggregation

This article explains how to iterate over DataFrame rows with pandas iterrows(), demonstrates grouping data using pandas groupby() with examples of splitting, aggregation, transformation, and filtration, and provides complete code snippets for each operation.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Using pandas iterrows() and groupby() for DataFrame Row Iteration and Aggregation

The iterrows() function is a generator that iterates over DataFrame rows, returning each row's index and a Series containing the row data, making it useful for row‑wise processing.

<code>df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df</code>
<code># Iterate rows
for index, row in df.iterrows():
    # index is the row label, row is a Series
    print(index)
    print(row['A'])   # first column value
    print(row[-1])    # last column value
    print(row[1])     # second column value
</code>

The groupby() function groups data similarly to SQL's GROUP BY, allowing splitting, applying functions, and combining results. Common operations include splitting data, aggregation, transformation, and filtration.

<code>ipl_data = {
    'Team': ['Riders','Riders','Devils','Devils','Kings','kings','Kings','Kings','Riders','Royals','Royals','Riders'],
    'Rank': [1,2,2,3,3,4,1,1,2,4,1,2],
    'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
    'Points': [876,789,863,673,741,812,756,788,694,701,804,690]
}
df = pd.DataFrame(ipl_data)
</code>

Grouping by a single column:

<code># Group by 'Team'
grouped = df.groupby('Team')
print(grouped.groups)
</code>

Grouping by multiple columns and iterating groups:

<code># Group by 'Rank' without using the index as group key
grouped = df.groupby('Rank', as_index=False)
for name in grouped:
    print(name)
</code>

Retrieving a specific group with get_group() :

<code>grouped = df.groupby('Year')
print(grouped.get_group(2014))
</code>

Aggregations can be performed with agg() to compute multiple statistics on one or more columns:

<code># Multiple aggregations on 'Points'
print(grouped['Points'].agg({'mean': np.mean, 'std': np.std, 'max': np.max}))
# Different aggregations for different columns
print(grouped.agg({'Points': [np.mean, 'sum'], 'Rank': [np.max]}))
</code>

Transformations apply a function to each group and broadcast the result back to the original index:

<code>grouped = df.groupby('Team')
score = lambda x: (x - x.mean()) / x.std() * 10
print(grouped.transform(score))
</code>

Filtration removes groups that do not meet a condition, such as keeping only groups with at least three rows:

<code>print(df.groupby('Team').filter(lambda x: len(x) >= 3))
</code>

Overall, the article provides a step‑by‑step guide to using pandas for row iteration, grouping, aggregation, transformation, and filtration, complete with runnable code examples.

data analysisdataframepandasgroupbyAggregationiterrows
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.