Pandas apply(), map(), transform(): Custom Data Transformations
Pandas gives you dozens of built-in methods like .sum(), .mean(), and .str.upper(). But what happens when you need to run your own custom logic on every row, column, or cell? That is where apply(), map(), and transform() come in.
Think of these methods as assembly lines. You feed in your data, each item passes through your custom function, and transformed results come out the other end. The trick is knowing which assembly line to use for each job.
In this tutorial, you'll learn the differences between apply(), map(), and transform(), when to use each one, and why vectorized operations should be your first choice whenever possible.
What Does map() Do on a Series?
Series.map() applies a function (or a dictionary) to every single value in a Series. It is the simplest transformation tool -- one value goes in, one value comes out.
How Does apply() Work on a Series?
Series.apply() is very similar to map() but is slightly more flexible. It supports passing extra arguments to your function and works better with more complex logic. For simple cases, map() and apply() on a Series are interchangeable.
How Does apply() Work on a DataFrame?
When you call apply() on a DataFrame, the function receives an entire column (by default, axis=0) or an entire row (axis=1). This is different from Series apply, which receives one scalar value at a time.
A useful pattern with axis=1 is creating new columns that depend on multiple existing columns. For example, computing a weighted score from several subject grades.
What About DataFrame.map()?
In older versions of Pandas, applymap() applied a function to every single cell in a DataFrame. Since Pandas 2.1, this has been renamed to DataFrame.map(). It works element-wise -- your function receives one scalar value at a time, not a row or column.
When Should You Use transform()?
transform() is a stricter version of apply(). It guarantees that the output has the exact same shape as the input. If your function tries to return fewer rows or a different shape, Pandas will raise an error. This makes it perfect for normalizing data or adding group-level statistics back to each row.
Why Should You Avoid apply() When Possible?
Every call to apply() runs a Python function in a loop. That is orders of magnitude slower than vectorized Pandas or NumPy operations, which run in optimized C code. For simple math, string operations, or conditional logic, there is almost always a vectorized alternative.
# Runs Python loop per row
df['discounted'] = df['price'].apply(
lambda p: p * 0.9
)# Runs in C under the hood
df['discounted'] = df['price'] * 0.9Practice Exercises
Use Series.map() with a dictionary to convert letter grades to GPA points. Map: A=4.0, B=3.0, C=2.0, D=1.0, F=0.0.
Print the resulting GPA Series as a list.
Use apply() with axis=1 to create a new column label in the DataFrame. The label should be the name followed by the age in parentheses, like "Alice (30)".
Print the label column as a list.
What will this code print?
import pandas as pd
s = pd.Series([1, 2, 3, 4])
print(s.map(lambda x: x ** 2 if x % 2 == 0 else x).tolist())Use DataFrame.map() to format every numeric cell in df as a string with a dollar sign and 2 decimal places (e.g., "$10.50").
Print the first row as a list: print(df.iloc[0].tolist()).
This code uses apply() to compute a 10% discount. Refactor it to use vectorized operations instead (no apply, no map, no loops). The output should be identical.
Print the discounted column as a list.
This code tries to create a total column that sums math and science scores for each student using apply(). But it gives wrong results because of a missing argument. Fix the bug.
Print the total column as a list.