Pandas Pivot Tables and Cross-Tabulation for Business Analysis

Advanced25 min5 exercises95 XP

0/5 exercises

If you have ever used a pivot table in Excel or Google Sheets, you already understand the core idea. You take a flat table of raw data and reshape it into a summary that shows one variable along the rows, another along the columns, and computed values (sums, averages, counts) in the cells.

Pandas brings this same power to Python with pd.pivot_table(). But unlike Excel, you can script it, automate it, and handle millions of rows without your spreadsheet freezing. In this tutorial, you'll learn how to create pivot tables, customize aggregation functions, add margins (totals), use cross-tabulation, and even reverse-pivot data with melt().

How Do You Create a Pivot Table?

The pd.pivot_table() function takes your DataFrame and reshapes it. You specify which column becomes the index (rows), which becomes the columns, what values to summarize, and which aggregation function to use.

A basic pivot table: revenue by region and product

Loading editor...

The result is a DataFrame where each row is a region, each column is a product, and the cells contain the sum of revenue for that combination. East sold 380 worth of Widgets (200 + 180) and 300 of Gadgets. Clean and scannable.

What Aggregation Functions Can You Use?

The aggfunc parameter controls how values are summarized. It defaults to 'mean', but you can use 'sum', 'count', 'min', 'max', 'std', or even a list of functions. You can also pass margins=True to add row and column totals.

Multiple aggfuncs with margin totals

Loading editor...

What Is the Difference Between pivot() and pivot_table()?

Pandas has two reshaping methods with similar names. pivot() is simpler -- it just rearranges your data without any aggregation. It only works when each (index, column) pair maps to exactly one value. If there are duplicates, it raises an error.

pivot_table() handles duplicates by aggregating them (default: mean). Use pivot() for pure reshaping when you know values are unique. Use pivot_table() when you need summarization.

pivot() -- no aggregation, unique pairs only

df.pivot(
    index='date',
    columns='city',
    values='temp'
)
# Error if duplicate (date, city) pairs!

pivot_table() -- handles duplicates

pd.pivot_table(
    df,
    index='date',
    columns='city',
    values='temp',
    aggfunc='mean'
)
# Averages duplicate entries

What Is Cross-Tabulation?

pd.crosstab() is a shortcut for creating a frequency table -- it counts how many times each combination of two categorical variables appears. Think of it as pivot_table() with aggfunc='count' as the default, but with a simpler syntax.

Cross-tabulation for frequency counts

Loading editor...

You can also normalize the crosstab to show percentages instead of raw counts. Pass normalize=True for overall percentages, normalize='index' for row percentages, or normalize='columns' for column percentages.

Normalized cross-tabulation

Loading editor...

How Do You Reverse a Pivot with melt()?

Sometimes your data arrives in "wide" format -- one column per measurement -- and you need it in "long" format for analysis or plotting. pd.melt() is the opposite of pivoting. It takes columns and unpivots them into rows.

Melting wide data to long format

Loading editor...

Practice Exercises

Create a Sales Pivot Table

Write Code

Create a pivot table from df with region as the index, product as the columns, and revenue as the values. Use sum as the aggregation function and fill_value=0.

Print the resulting pivot table.

Loading editor...

Add Totals with Margins

Write Code

Create the same pivot table as before, but add margins=True to include row and column totals. The margin label should be the default "All".

Print the pivot table. Make sure to use fill_value=0.

Loading editor...

Create a Cross-Tabulation

Write Code

Use pd.crosstab() to create a frequency table showing how many students from each grade_level chose each elective. Print the result.

Loading editor...

Unpivot with melt()

Write Code

Use pd.melt() to convert the wide DataFrame into long format. Keep name as the identifier column. The melted variable column should be called subject and the value column should be called score.

Print the result using .to_string(index=False).

Loading editor...

Predict the Pivot Table Shape

Predict Output

What will this code print? Think about how many unique values exist in the index and columns.

import pandas as pd
df = pd.DataFrame({
    'color': ['red', 'blue', 'red', 'blue', 'green'],
    'size': ['S', 'S', 'M', 'M', 'S'],
    'count': [1, 2, 3, 4, 5]
})
pivot = pd.pivot_table(df, index='color', columns='size', values='count', aggfunc='sum', fill_value=0)
print(pivot.shape)

Loading editor...