Build a Log File Parser That Finds Errors and Patterns
Server logs are the black box of software. When something goes wrong at 3 AM, logs are the first thing an engineer checks. But log files can be thousands (or millions) of lines long. You need a parser that can cut through the noise and find what matters.
In this project, you'll build a log parser from scratch using Python's re module. You'll parse structured log lines, extract timestamps and severity levels, count errors by type, find patterns, and build a complete parser class that generates summary reports.
Step 1: Parse Log Lines with Regex
Log files follow a standard format. Each line typically has a timestamp, a severity level (INFO, WARNING, ERROR, CRITICAL), and a message. Here's what real log lines look like:
Each line follows the pattern: DATE TIME LEVEL MESSAGE. We can use a regular expression to extract each piece. The regex pattern r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)' captures four groups: date, time, level, and message.
Write a function parse_log_line(line) that:
1. Uses regex to parse a log line into its components
2. Returns a dictionary with keys: date, time, level, message
3. Returns None if the line doesn't match the expected format
The log format is: YYYY-MM-DD HH:MM:SS LEVEL Message text
Then parse the provided sample lines and print each parsed result.
Step 2: Extract Timestamps and Count by Level
Now that we can parse individual lines, let's process an entire log. We want to parse all lines, count how many of each severity level we have, and identify the time range covered by the log.
Write a function count_by_level(log_text) that:
1. Splits the log text into lines and parses each one
2. Counts the number of entries per severity level
3. Returns a dictionary with level names as keys and counts as values
4. Skips lines that don't match the log format
Print the counts sorted by count descending.
Step 3: Count Errors by Type
Knowing you have 5 errors is useful. Knowing that 3 of them are "connection refused" and 2 are "timeout" is much more useful. Let's classify error messages into types by extracting key phrases.
Error messages often follow patterns. We can use keyword matching or regex to group similar errors together. For example, "Failed to connect to cache server" and "Failed to connect to database" are both connection errors.
Write a function classify_errors(log_text) that:
1. Parses the log and filters for ERROR and CRITICAL entries only
2. Classifies each error into a type based on keywords in the message:
- Connection: message contains "connect" or "connection"
- Timeout: message contains "timeout" or "timed out"
- Authentication: message contains "auth" or "login"
- Disk: message contains "disk" or "space"
- Other: anything that doesn't match
3. Returns a dictionary mapping error types to their count
Print the error types sorted by count.
Step 4: Find Time-Based Patterns
Patterns in log data often reveal the root cause of problems. If errors cluster around the same time, it usually means a single event caused multiple failures. Let's analyze error timing to find these clusters.
Write a function find_error_bursts(log_text, window_minutes=5) that:
1. Parses all ERROR and CRITICAL entries with their timestamps
2. Groups errors that occur within window_minutes of each other into "bursts"
3. Returns a list of bursts, where each burst is a dict with:
- start_time: timestamp of the first error in the burst
- end_time: timestamp of the last error in the burst
- count: number of errors in the burst
- messages: list of error messages
Print each burst found. A burst is a group where each consecutive error is within the window of the previous one.
Step 5: Generate an Error Summary
Before we build the full parser class, let's create a summary function that gives a quick overview of what happened in the log. This is the kind of output an on-call engineer wants to see: how many total entries, how many errors, what the most common error type is, and the worst burst.
Write a function error_summary(log_text) that prints:
1. Total log entries
2. Count of each severity level
3. The most frequent error message (the exact message that appears most often)
4. The hour with the most errors (format: HH:00)
Format:
=== ERROR SUMMARY ===
Total entries: X
INFO: X | WARNING: X | ERROR: X | CRITICAL: X
Most frequent error: MESSAGE (X occurrences)
Peak error hour: HH:00Step 6: Build the Complete LogParser Class
Now let's wrap everything into a reusable class. A LogParser class gives us a clean interface: load the logs once, then call different analysis methods as needed. This is how professional log analysis tools are structured.
Build a LogParser class with:
`__init__(self, log_text)`: Parse all lines and store the entries as a list of dicts.
`level_counts(self)`: Return a dict of level -> count.
`errors_only(self)`: Return a list of only ERROR and CRITICAL entries.
`most_common_error(self)`: Return a tuple of (message, count) for the most frequent error.
`report(self)`: Print a formatted summary report showing total entries, level breakdown, error count, and most common error.
Test it with the provided log data.
What You Built
| You built a complete log file parser that turns unstructured text into actionable insights: | ||
|---|---|---|
| --- | --- | --- |
| Line parser | Extract structured data from log lines | re.match() with capture groups |
| Level counter | Count entries by severity | Counter from collections |
| Error classifier | Group errors by type | Keyword matching |
| Burst detector | Find clusters of errors in time | Sliding time window |
| Summary report | Present key findings | Formatted output |
| Parser class | Wrap everything in a reusable package | OOP with methods |
This parser handles the most common log analysis tasks. Professional tools like Splunk, ELK Stack, and Datadog work on the same principles, just at massive scale with distributed processing and real-time streaming.