Pandas apply(), map(), transform(): Custom Data Transformations

Advanced25 min6 exercises110 XP

0/6 exercises

Pandas gives you dozens of built-in methods like .sum(), .mean(), and .str.upper(). But what happens when you need to run your own custom logic on every row, column, or cell? That is where apply(), map(), and transform() come in.

Think of these methods as assembly lines. You feed in your data, each item passes through your custom function, and transformed results come out the other end. The trick is knowing which assembly line to use for each job.

In this tutorial, you'll learn the differences between apply(), map(), and transform(), when to use each one, and why vectorized operations should be your first choice whenever possible.

What Does map() Do on a Series?

Series.map() applies a function (or a dictionary) to every single value in a Series. It is the simplest transformation tool -- one value goes in, one value comes out.

Series.map() with functions and dictionaries

Loading editor...

How Does apply() Work on a Series?

Series.apply() is very similar to map() but is slightly more flexible. It supports passing extra arguments to your function and works better with more complex logic. For simple cases, map() and apply() on a Series are interchangeable.

Series.apply() with lambdas and extra args

Loading editor...

How Does apply() Work on a DataFrame?

When you call apply() on a DataFrame, the function receives an entire column (by default, axis=0) or an entire row (axis=1). This is different from Series apply, which receives one scalar value at a time.

DataFrame.apply() with axis=0 vs axis=1

Loading editor...

A useful pattern with axis=1 is creating new columns that depend on multiple existing columns. For example, computing a weighted score from several subject grades.

Creating columns from row-level logic

Loading editor...

What About DataFrame.map()?

In older versions of Pandas, applymap() applied a function to every single cell in a DataFrame. Since Pandas 2.1, this has been renamed to DataFrame.map(). It works element-wise -- your function receives one scalar value at a time, not a row or column.

DataFrame.map() for element-wise operations

Loading editor...

When Should You Use transform()?

transform() is a stricter version of apply(). It guarantees that the output has the exact same shape as the input. If your function tries to return fewer rows or a different shape, Pandas will raise an error. This makes it perfect for normalizing data or adding group-level statistics back to each row.

transform() for within-group normalization

Loading editor...

Why Should You Avoid apply() When Possible?

Every call to apply() runs a Python function in a loop. That is orders of magnitude slower than vectorized Pandas or NumPy operations, which run in optimized C code. For simple math, string operations, or conditional logic, there is almost always a vectorized alternative.

Slow: apply()

# Runs Python loop per row
df['discounted'] = df['price'].apply(
    lambda p: p * 0.9
)

Fast: vectorized

# Runs in C under the hood
df['discounted'] = df['price'] * 0.9

Practice Exercises

Map Values with a Dictionary

Write Code

Use Series.map() with a dictionary to convert letter grades to GPA points. Map: A=4.0, B=3.0, C=2.0, D=1.0, F=0.0.

Print the resulting GPA Series as a list.

Loading editor...

Row-Level Apply

Write Code

Use apply() with axis=1 to create a new column label in the DataFrame. The label should be the name followed by the age in parentheses, like "Alice (30)".

Print the label column as a list.

Loading editor...

Predict the map() Output

Predict Output

What will this code print?

import pandas as pd
s = pd.Series([1, 2, 3, 4])
print(s.map(lambda x: x ** 2 if x % 2 == 0 else x).tolist())

Loading editor...

Element-wise DataFrame Formatting

Write Code

Use DataFrame.map() to format every numeric cell in df as a string with a dollar sign and 2 decimal places (e.g., "$10.50").

Print the first row as a list: print(df.iloc[0].tolist()).

Loading editor...

Refactor apply() to Vectorized

Refactor

This code uses apply() to compute a 10% discount. Refactor it to use vectorized operations instead (no apply, no map, no loops). The output should be identical.

Print the discounted column as a list.

Loading editor...

Fix the apply() Bug

Fix the Bug

This code tries to create a total column that sums math and science scores for each student using apply(). But it gives wrong results because of a missing argument. Fix the bug.

Print the total column as a list.

Loading editor...