Skip to main content

Pandas: Create, Load, and Explore DataFrames

Intermediate30 min7 exercises105 XP
0/7 exercises

You have a spreadsheet with 10,000 rows of sales data. You need to find every order over $500, group them by region, and calculate the average. In Excel, that takes a pivot table, a filter, and a lot of clicking. In pandas, it takes three lines of code.

Pandas is the most popular data analysis library in Python. It gives you a DataFrame — think of it as a programmable spreadsheet. Every column has a name, every row has an index, and you can slice, filter, and transform the whole thing with a single expression.

In this tutorial, you'll learn how to create DataFrames from scratch, inspect their shape and contents, and perform your first data operations. By the end, you'll be comfortable building and exploring tabular data entirely in Python.

What Is a Pandas Series?

Before you meet the DataFrame, you need to understand its building block: the Series. A Series is a single column of data with an index. Think of it like a labeled list.

Creating a Series
Loading editor...

Notice the numbers on the left? That's the index — pandas automatically assigns 0, 1, 2, ... but you can set custom labels too. The index is what makes pandas powerful. It lets you align data by label instead of by position.

Custom index labels
Loading editor...

How Do You Create a DataFrame?

A DataFrame is a table with rows and columns — like a spreadsheet, a SQL table, or a CSV file loaded into memory. The most common way to create one is from a dictionary where each key becomes a column name and each value is a list of data.

DataFrame from a dictionary
Loading editor...

You can also create a DataFrame from a list of dictionaries. This is handy when each row comes from a separate record — like API responses or database rows.

DataFrame from a list of dictionaries
Loading editor...

How Do You Inspect a DataFrame?

When you get a new dataset, the first thing you do is explore it. Pandas gives you several tools to peek at the data without printing all 10,000 rows.

head() and tail()
Loading editor...
shape, columns, and dtypes
Loading editor...

What Does describe() Tell You?

The describe() method gives you summary statistics for every numeric column: count, mean, standard deviation, min, max, and the 25th/50th/75th percentiles. It's the fastest way to spot outliers and get a feel for the data.

describe() in action
Loading editor...

How Do You Select Columns?

To grab a single column, use square brackets with the column name. This returns a Series. To grab multiple columns, pass a list of names inside the brackets. This returns a DataFrame.

Selecting one vs. multiple columns
Loading editor...

What Basic Operations Can You Do?

Pandas lets you do math on entire columns at once. You can add, subtract, multiply, or apply functions — and it operates on every row automatically. This is called vectorized operations, and it's much faster than writing a loop.

Column math and aggregation
Loading editor...

How Do You Sort and Count Values?

Two methods you'll reach for constantly: sort_values() to order rows, and value_counts() to tally how often each value appears in a column.

sort_values() and value_counts()
Loading editor...

value_counts() is one of the most useful one-liners in pandas. It instantly tells you the distribution of a categorical column — how many engineers, how many salespeople, and so on.

Quick Recap: Your Pandas Starter Toolkit

Here's a cheat sheet of everything you learned:

  • Series: a single column with an index
  • DataFrame: a table (multiple Series sharing an index)
  • Creating: pd.DataFrame(dict) or pd.DataFrame(list_of_dicts)
  • Inspecting: .head(), .tail(), .shape, .dtypes, .describe()
  • Selecting: df['col'] for one column, df[['a', 'b']] for multiple
  • Computing: df['new'] = df['a'] * df['b'], .sum(), .mean()
  • Sorting: .sort_values('col'), .value_counts()

  • Practice Exercises

    Create a Named Series
    Write Code

    Create a pandas Series called temps containing the values [72, 68, 75, 80, 77] with the name 'daily_temp'. Print the Series name using temps.name.

    Loading editor...
    Build a DataFrame from a Dictionary
    Write Code

    Create a DataFrame called df with three columns:

  • 'fruit': ['Apple', 'Banana', 'Cherry']
  • 'price': [1.2, 0.5, 2.0]
  • 'stock': [100, 200, 50]
  • Print the column names as a list using df.columns.tolist().

    Loading editor...
    DataFrame from Records
    Write Code

    Create a DataFrame called df from a list of dictionaries representing three students:

  • {'name': 'Alice', 'grade': 90}
  • {'name': 'Bob', 'grade': 85}
  • {'name': 'Charlie', 'grade': 92}
  • Print the shape of the DataFrame using print(df.shape).

    Loading editor...
    Predict the Output: head()
    Predict Output

    What will this code print?

    import pandas as pd
    df = pd.DataFrame({'x': [10, 20, 30, 40, 50]})
    print(df.head(2).values.tolist())
    Loading editor...
    Add a Computed Column
    Write Code

    Given a DataFrame with 'price' and 'quantity' columns, add a new column called 'total' that equals price * quantity. Then print the total column as a list using print(df['total'].tolist()).

    Use the provided starter data.

    Loading editor...
    Fix the Bug: Column Selection
    Fix the Bug

    This code tries to select two columns from a DataFrame but throws an error. Fix the bug so it prints the name and city columns as a DataFrame. Print the column names as a list using print(result.columns.tolist()).

    Loading editor...
    Extract Stats from describe()
    Write Code

    Given a DataFrame with a 'score' column containing [80, 90, 70, 100, 85], use describe() to get the summary statistics for that column. Print the mean value using print(df['score'].describe()['mean']).

    Loading editor...