Python Regular Expressions: Pattern Matching from Basics to Advanced

Intermediate30 min7 exercises110 XP

0/7 exercises

Imagine you're working at a help desk and need to find every email address buried in thousands of support tickets. You could scan each line character by character, but that would take forever. Regular expressions let you describe the pattern of an email address, and Python finds every match for you.

Regular expressions (often called regex or regexp) are a mini-language for describing text patterns. Python's built-in re module gives you functions to search, extract, replace, and split text using these patterns.

In this tutorial, you'll learn to use re.search(), re.findall(), re.sub(), character classes, quantifiers, groups, and compiled patterns. By the end, you'll be able to validate input, extract data, and clean messy strings with confidence.

Why Do Regex Patterns Use Raw Strings?

Before diving into patterns, there's one important habit: always write regex patterns as raw strings by putting an r before the opening quote, like r'\d+'. Without the r, Python interprets backslashes as escape characters (like \n for newline) before the regex engine ever sees them.

Without raw string (risky)

import re
# \b is a backspace in normal strings!
pattern = '\\bcat\\b'
print(re.search(pattern, 'the cat sat'))

With raw string (correct)

import re
# r prefix keeps backslashes literal
pattern = r'\bcat\b'
print(re.search(pattern, 'the cat sat'))

How Do re.search() and re.match() Work?

re.search(pattern, string) scans the entire string and returns the first match it finds (or None if there's no match). re.match(pattern, string) only checks at the beginning of the string. Most of the time, re.search() is what you want.

search() vs match()

Loading editor...

When re.search() finds a match, it returns a Match object. Call .group() to get the full match, or .group(1) to get the first captured group (the part inside parentheses). If there's no match, it returns None, so always check before calling .group().

What Are Character Classes and Quantifiers?

Character classes define which characters to match. Quantifiers define how many of them to match. Together, they form the backbone of every regex pattern.

Character classes and quantifiers in action

Loading editor...

Here's a quick reference for the most useful shorthand classes:

\d -- any digit (0-9)

\D -- any non-digit

\w -- any word character (letters, digits, underscore)

\W -- any non-word character

\s -- any whitespace (space, tab, newline)

\S -- any non-whitespace

. -- any character except newline

How Do You Find All Matches or Replace Text?

re.findall() returns a list of all non-overlapping matches. re.sub() replaces every match with a new string. These two functions handle the most common regex tasks: extracting data and cleaning text.

findall() and sub() examples

Loading editor...

Notice that when findall() uses capturing groups (parentheses), it returns only the captured part, not the full match. Without groups, it returns the full match. This is a common source of confusion.

How Do Groups Let You Extract Specific Parts?

Parentheses () create capturing groups that let you extract specific pieces of a match. Think of them as highlighting the parts you care about inside a larger pattern.

Capturing groups and named groups

Loading editor...

How Do You Split Strings with re.split()?

Python's built-in str.split() only splits on a fixed separator. re.split() splits on a pattern, which is much more flexible. Need to split on commas, semicolons, and pipes all at once? Regex makes that easy.

re.split() with patterns

Loading editor...

The maxsplit parameter is handy when you only want to split at the first occurrence. In the log example above, we split only at the first colon so the rest of the message stays intact.

What Are Some Common Regex Patterns?

Here are patterns you'll use over and over. Each one solves a common real-world task:

Common real-world regex patterns

Loading editor...

Practice Exercises

Find a Phone Number

Write Code

Write a function find_phone(text) that uses re.search() to find the first phone number in the format XXX-XXX-XXXX in the given text. Return the phone number as a string, or "Not found" if no phone number exists.

Loading editor...

Extract All Hashtags

Write Code

Write a function extract_hashtags(text) that returns a list of all hashtags in the text. A hashtag starts with # followed by one or more word characters (letters, digits, or underscores). Return just the tag names without the # symbol.

Loading editor...

Predict the Regex Output

Predict Output

What will this code print? Think carefully about greedy vs lazy matching and what findall returns with groups.

Loading editor...

Censor Credit Card Numbers

Write Code

Write a function censor_cards(text) that replaces all sequences of 4 groups of 4 digits separated by dashes (like 1234-5678-9012-3456) with ****-****-****-XXXX, where XXXX is the last 4 digits. Use re.sub() with a function or groups.

Loading editor...

Fix the Email Validator

Fix the Bug

This email validator has two bugs. Find and fix them so it correctly validates that a string looks like an email address (one or more word characters/dots/hyphens, then @, then a domain, then a dot, then 2-4 letters).

Loading editor...

Parse a Log Entry

Write Code

Write a function parse_log(entry) that extracts the timestamp, level, and message from a log entry formatted as "[HH:MM:SS] LEVEL: message". Return a dictionary with keys "time", "level", and "message". Use named groups. If the entry doesn't match, return None.

Loading editor...

Clean and Normalize Whitespace

Refactor

Refactor this messy string-cleaning code to use re.sub(). The function should: (1) replace all runs of multiple spaces/tabs with a single space, and (2) strip leading and trailing whitespace. The current code uses multiple chained .replace() calls -- make it cleaner with one regex substitution plus .strip().

Loading editor...