Pandas merge(), join(), concat(): Combine DataFrames Like SQL

Intermediate25 min6 exercises90 XP

0/6 exercises

You have a table of customers and a separate table of orders. You need to match each order to the customer who placed it. If you've used SQL, you know this as a JOIN. In pandas, the same idea exists — it's called merge().

Combining data from multiple sources is one of the most important skills in data analysis. In this tutorial, you'll learn three ways to combine DataFrames: merge() for SQL-style joins, concat() for stacking DataFrames, and join() for index-based combinations.

How Does pd.merge() Work?

pd.merge() combines two DataFrames based on a shared column — just like a SQL JOIN. The default is an inner join, which keeps only the rows that have a match in both DataFrames.

Inner merge on a shared column

Loading editor...

Notice that Alice appears twice because she has two orders, and Charlie and Diana disappear because they have no orders. That's the inner join — only matches survive.

What Are the Different Join Types?

The how parameter controls which rows to keep. There are four options: inner (default), left, right, and outer.

Left join vs. outer join

Loading editor...

What If the Column Names Don't Match?

Often the join key has a different name in each table — one calls it 'customer_id', the other calls it 'cust_id'. Use left_on and right_on instead of on.

Merging with different key names

Loading editor...

How Do You Stack DataFrames with concat()?

While merge() combines DataFrames side by side (adding columns), concat() stacks them (adding rows). It's like appending one spreadsheet below another.

Stacking DataFrames with concat

Loading editor...

You can also concatenate horizontally (side by side) by setting axis=1. This is useful when you have separate DataFrames for different columns of the same observations.

Horizontal concatenation with axis=1

Loading editor...

How Do You Debug a Merge?

When a merge gives unexpected results, the indicator=True parameter adds a _merge column that tells you where each row came from: both, left_only, or right_only.

Merge indicator for debugging

Loading editor...

What Happens with Duplicate Column Names?

If both DataFrames have a column with the same name (that isn't the join key), pandas adds suffixes _x and _y to tell them apart. You can customize these with the suffixes parameter.

Custom suffixes for duplicate columns

Loading editor...

What Are One-to-Many and Many-to-Many Joins?

A one-to-many join is the most common: one customer has many orders. The customer row gets duplicated for each matching order. A many-to-many join happens when both sides have duplicate keys — and it creates a cartesian product of the matches.

One-to-many join

Loading editor...

Quick Recap: Choosing the Right Combine Method

Pandas gives you three ways to combine DataFrames. Here's when to use each:

`pd.merge()`: Combine on a shared column (like SQL JOIN). Use for matching records between tables.

`pd.concat()`: Stack DataFrames vertically (add rows) or horizontally (add columns). Use when you have the same structure split across files.

`df.join()`: Shorthand for merging on the index. Use when your DataFrames share a meaningful index.

Practice Exercises

Inner Merge Two Tables

Write Code

Merge the students and grades DataFrames on 'student_id' using an inner join. Print the resulting names and grades as a list of lists using .values.tolist().

Loading editor...

Left Join to Keep All Customers

Write Code

Perform a left join of customers and orders on 'customer_id'. Print the number of rows in the result using len().

Loading editor...

Merge with Different Key Names

Write Code

Merge products and inventory where the key is called 'prod_id' in products and 'product_id' in inventory. Print the product names and their stock levels as a list of lists.

Loading editor...

Stack DataFrames with concat

Write Code

Concatenate q1 and q2 DataFrames vertically (stacking rows) with ignore_index=True. Print the shape of the result.

Loading editor...

Predict the Output: Outer Merge

Predict Output

What will this code print?

import pandas as pd
left = pd.DataFrame({'id': [1, 2], 'a': ['x', 'y']})
right = pd.DataFrame({'id': [2, 3], 'b': ['p', 'q']})
result = pd.merge(left, right, on='id', how='outer')
print(len(result))

Loading editor...

Fix the Bug: Duplicate Column Suffixes

Fix the Bug

After merging, this code tries to access a column called 'value_2023' but it doesn't exist because pandas used default suffixes. Fix the merge to use custom suffixes ('_2022', '_2023') and print the 'value_2023' column as a list.

Loading editor...