Skip to main content

Pandas GroupBy: Split-Apply-Combine for Powerful Aggregations

Advanced30 min7 exercises130 XP
0/7 exercises

Imagine you have a giant pile of receipts from a store. You want to know how much each department earned last month. What do you do? You sort the receipts into piles by department, add up each pile, and write down the totals.

That is exactly what groupby() does in Pandas. It splits your data into groups based on some column, applies a calculation to each group, and combines the results back together. This three-step pattern is called split-apply-combine, and it is one of the most powerful ideas in data analysis.

In this tutorial, you'll learn how to group data, aggregate with multiple functions, use named aggregation, transform values within groups, filter entire groups, and work with multi-level grouping.

How Does GroupBy Work?

When you call df.groupby('column'), Pandas does not immediately compute anything. Instead, it creates a GroupBy object -- a lazy container that knows which rows belong to which group. The real work happens when you call an aggregation method like .sum(), .mean(), or .count().

Basic groupby and aggregation
Loading editor...

The result is a Series with the group labels as the index. East had 200 + 300 + 180 = 680 in revenue, and West had 150 + 250 + 320 = 720.

How Do You Aggregate with Multiple Functions?

Often you need more than just a sum. You might want the mean, count, and max all at once. The .agg() method lets you pass a list of function names or even a dictionary that maps columns to different functions.

Using agg() with multiple functions
Loading editor...
Named aggregation syntax
Loading editor...

What Is GroupBy Transform?

Sometimes you don't want a summary table. You want to add a column to the original DataFrame that contains the group-level result for every row. For example, you want each row to show the average revenue of its region. That is what .transform() does -- it returns a Series with the same length as the original data.

transform() broadcasts group results back
Loading editor...

How Do You Filter Entire Groups?

The .filter() method lets you keep or drop entire groups based on a condition. For example, you might want only regions that had more than two sales, or departments where the average score was above 80.

Filtering groups with a condition
Loading editor...

Class B had an average grade of 57.5, which is below 75, so every row from class B was removed. Classes A and C passed the filter and kept all their rows.

Grouping by Multiple Columns

You can pass a list of columns to groupby() to create subgroups. This is like sorting receipts first by department, then by payment method within each department.

Multi-column groupby
Loading editor...

Custom Group Logic with apply()

When built-in aggregations aren't enough, .apply() lets you run any function on each group. The function receives a DataFrame (the group) and can return a scalar, a Series, or a DataFrame.

Using apply() for custom group operations
Loading editor...

Practice Exercises

Basic GroupBy Sum
Write Code

Given a DataFrame df with columns department and salary, group by department and print the total salary for each department. The DataFrame is already created for you.

Print the result of the groupby sum (it should be a Series).

Loading editor...
Named Aggregation
Write Code

Use named aggregation to group df by region and compute:

  • total_revenue: sum of revenue
  • avg_units: mean of units
  • Print the resulting DataFrame.

    Loading editor...
    Predict the GroupBy Output
    Predict Output

    What will this code print? Think about how many rows each group has.

    import pandas as pd
    df = pd.DataFrame({'team': ['A', 'B', 'A', 'B', 'A'], 'score': [10, 20, 30, 40, 50]})
    print(df.groupby('team')['score'].count().tolist())
    Loading editor...
    Normalize with Transform
    Write Code

    Add a new column called pct_of_dept to df that shows each employee's salary as a percentage of their department's total salary. Round to 1 decimal place.

    Print df[['name', 'department', 'pct_of_dept']].to_string(index=False).

    Loading editor...
    Filter Groups by Size
    Write Code

    Filter df to keep only teams that have more than 2 members. Print the name column of the filtered DataFrame as a list.

    Use groupby().filter() with a lambda.

    Loading editor...
    Multi-Column GroupBy
    Write Code

    Group df by both region and product, then compute the sum of revenue. Reset the index and print the resulting DataFrame using .to_string(index=False).

    Loading editor...
    Fix the GroupBy Bug
    Fix the Bug

    This code tries to add a column showing each row's deviation from its group mean, but it has a bug. Fix it so it works correctly.

    The output should print the deviation column as a list, rounded to 1 decimal place each.

    Loading editor...