Skip to main content

Pandas String and DateTime Operations for Real-World Data

Intermediate25 min6 exercises85 XP
0/6 exercises

Real-world data is messy. Names arrive in ALL CAPS, dates come as random text strings like "Jan 5, 2024" or "2024/01/05", and phone numbers have dashes in some rows but not others. Before you can analyze anything, you need to clean this stuff up.

Pandas has two powerful accessories for this job: the `.str` accessor for string operations and the `.dt` accessor for datetime operations. Think of them as specialized toolboxes that attach to any Series of strings or dates.

In this tutorial, you'll learn how to clean text data, extract patterns from strings, convert messy dates into proper datetime objects, and pull out components like year, month, and day of the week.

How Do You Clean Text with the .str Accessor?

When a column contains strings, you can use .str followed by any Python string method. The method runs on every value in the column at once -- no loops needed.

Basic string cleaning with .str
Loading editor...

You can chain .str methods together just like regular Python string methods. Each step produces a new Series, so the original data stays untouched.

How Do You Search and Replace in String Columns?

The .str.contains() method checks if each string matches a pattern. It returns a boolean Series that you can use as a filter. For replacing text, .str.replace() works like Python's str.replace() but on the whole column.

Searching and replacing in string columns
Loading editor...

How Do You Split Strings and Extract Parts?

.str.split() splits each string into a list. You can expand it into separate columns with expand=True. For more precise extraction, .str.extract() uses regex capture groups to pull out specific patterns.

Splitting strings and extracting patterns
Loading editor...
Extracting numbers with regex
Loading editor...

How Do You Convert Strings to Dates?

The pd.to_datetime() function is your go-to tool for turning messy date strings into proper datetime objects. It is surprisingly smart and can automatically parse many common formats.

Automatic date parsing with pd.to_datetime()
Loading editor...

When the automatic parser can't figure out your format, you can specify it explicitly using format=. This is also faster because Pandas doesn't have to guess.

Specifying a date format explicitly
Loading editor...

How Do You Extract Date Components?

Once you have a datetime column, the .dt accessor lets you pull out individual components like year, month, day, day of the week, and more. This is incredibly useful for time-based analysis.

Extracting date components with .dt
Loading editor...

How Do You Do Math with Dates?

You can subtract dates to get a timedelta (duration), add timedeltas to shift dates, and create date ranges with pd.date_range(). This is the foundation of time series analysis.

Date math and date ranges
Loading editor...

Practice Exercises

Clean Messy Names
Write Code

Clean the names Series by stripping whitespace and converting to title case. Print the cleaned names as a list.

Loading editor...
Filter with str.contains()
Write Code

Filter the DataFrame to keep only rows where the email column contains "gmail". Print the name column of the filtered rows as a list.

Loading editor...
Predict the String Operation Output
Predict Output

What will this code print?

import pandas as pd
s = pd.Series(['hello world', 'foo bar', 'python pandas'])
print(s.str.split(' ').str.get(0).str.upper().tolist())
Loading editor...
Parse Dates and Extract Month
Write Code

Convert the date_str column to datetime, then add a month column with the month number. Print the month column as a list.

Loading editor...
Calculate Days Between Dates
Write Code

Add a column days_to_deliver that shows how many days passed between order_date and delivery_date. Print the column as a list of integers.

Loading editor...
Fix the String Extraction Bug
Fix the Bug

This code tries to extract area codes from phone numbers, but the regex pattern is wrong. Fix the str.extract() call so it correctly pulls out the 3-digit area code from the parentheses.

Print the extracted area codes as a list.

Loading editor...