Pandas: Create, Load, and Explore DataFrames
You have a spreadsheet with 10,000 rows of sales data. You need to find every order over $500, group them by region, and calculate the average. In Excel, that takes a pivot table, a filter, and a lot of clicking. In pandas, it takes three lines of code.
Pandas is the most popular data analysis library in Python. It gives you a DataFrame — think of it as a programmable spreadsheet. Every column has a name, every row has an index, and you can slice, filter, and transform the whole thing with a single expression.
In this tutorial, you'll learn how to create DataFrames from scratch, inspect their shape and contents, and perform your first data operations. By the end, you'll be comfortable building and exploring tabular data entirely in Python.
What Is a Pandas Series?
Before you meet the DataFrame, you need to understand its building block: the Series. A Series is a single column of data with an index. Think of it like a labeled list.
Notice the numbers on the left? That's the index — pandas automatically assigns 0, 1, 2, ... but you can set custom labels too. The index is what makes pandas powerful. It lets you align data by label instead of by position.
How Do You Create a DataFrame?
A DataFrame is a table with rows and columns — like a spreadsheet, a SQL table, or a CSV file loaded into memory. The most common way to create one is from a dictionary where each key becomes a column name and each value is a list of data.
You can also create a DataFrame from a list of dictionaries. This is handy when each row comes from a separate record — like API responses or database rows.
How Do You Inspect a DataFrame?
When you get a new dataset, the first thing you do is explore it. Pandas gives you several tools to peek at the data without printing all 10,000 rows.
What Does describe() Tell You?
The describe() method gives you summary statistics for every numeric column: count, mean, standard deviation, min, max, and the 25th/50th/75th percentiles. It's the fastest way to spot outliers and get a feel for the data.
How Do You Select Columns?
To grab a single column, use square brackets with the column name. This returns a Series. To grab multiple columns, pass a list of names inside the brackets. This returns a DataFrame.
What Basic Operations Can You Do?
Pandas lets you do math on entire columns at once. You can add, subtract, multiply, or apply functions — and it operates on every row automatically. This is called vectorized operations, and it's much faster than writing a loop.
How Do You Sort and Count Values?
Two methods you'll reach for constantly: sort_values() to order rows, and value_counts() to tally how often each value appears in a column.
value_counts() is one of the most useful one-liners in pandas. It instantly tells you the distribution of a categorical column — how many engineers, how many salespeople, and so on.
Quick Recap: Your Pandas Starter Toolkit
Here's a cheat sheet of everything you learned:
pd.DataFrame(dict) or pd.DataFrame(list_of_dicts).head(), .tail(), .shape, .dtypes, .describe()df['col'] for one column, df[['a', 'b']] for multipledf['new'] = df['a'] * df['b'], .sum(), .mean().sort_values('col'), .value_counts()Practice Exercises
Create a pandas Series called temps containing the values [72, 68, 75, 80, 77] with the name 'daily_temp'. Print the Series name using temps.name.
Create a DataFrame called df with three columns:
'fruit': ['Apple', 'Banana', 'Cherry']'price': [1.2, 0.5, 2.0]'stock': [100, 200, 50]Print the column names as a list using df.columns.tolist().
Create a DataFrame called df from a list of dictionaries representing three students:
{'name': 'Alice', 'grade': 90}{'name': 'Bob', 'grade': 85}{'name': 'Charlie', 'grade': 92}Print the shape of the DataFrame using print(df.shape).
What will this code print?
import pandas as pd
df = pd.DataFrame({'x': [10, 20, 30, 40, 50]})
print(df.head(2).values.tolist())Given a DataFrame with 'price' and 'quantity' columns, add a new column called 'total' that equals price * quantity. Then print the total column as a list using print(df['total'].tolist()).
Use the provided starter data.
This code tries to select two columns from a DataFrame but throws an error. Fix the bug so it prints the name and city columns as a DataFrame. Print the column names as a list using print(result.columns.tolist()).
Given a DataFrame with a 'score' column containing [80, 90, 70, 100, 85], use describe() to get the summary statistics for that column. Print the mean value using print(df['score'].describe()['mean']).