Time Series: Detect Trends and Anomalies in Business Data
Business data is rarely static. Sales go up in December. Website traffic spikes on Monday mornings. Server loads follow predictable patterns until something breaks. Understanding these patterns — trends, seasonality, and anomalies — is what time series analysis is all about.
In this project, you'll analyze daily sales data for a retail business. You'll calculate moving averages to smooth out noise, detect upward or downward trends, find day-of-week patterns, and flag anomalous days where sales were unusually high or low. These are the same techniques used by finance teams, operations managers, and data analysts every day.
Step 1: Create a Time Series Dataset
Time series data has a special structure: each row represents a point in time. The index (or a date column) tells pandas the temporal order. We'll generate 90 days of daily sales data with a built-in upward trend, weekly seasonality, and some random noise.
Creating synthetic data that mimics real patterns is a valuable skill. It lets you test your analysis pipeline before you even have real data.
Create a pandas DataFrame with 90 days of synthetic daily sales data:
1. Use pd.date_range('2024-01-01', periods=90, freq='D') for dates.
2. Generate sales with a base of 1000, a daily increase of 5 (trend), weekend boost of 200, and random noise using np.random.seed(42) and np.random.normal(0, 80, 90).
3. Set the date as the index.
4. Print the first 5 rows, the shape, and the mean sales formatted as "Mean daily sales: $X.XX".
Step 2: Calculate Moving Averages
Raw daily data is noisy. A moving average smooths the data by averaging each point with its neighbors. A 7-day moving average is especially useful for business data because it evens out the weekly cycle, revealing the underlying trend.
Pandas makes this trivial with the rolling() method. You specify the window size and the aggregation function. The first few values will be NaN because there aren't enough previous data points to fill the window.
Using the DataFrame from Step 1, add two new columns:
1. ma_7 — 7-day rolling mean of sales.
2. ma_30 — 30-day rolling mean of sales.
Then print: (1) the first row where ma_7 is not NaN (use df.dropna(subset=['ma_7']).head(1)), (2) the last 5 rows showing sales, ma_7, and ma_30, and (3) "7-day MA range: $X.XX to $Y.XX" using min/max of ma_7 (excluding NaN).
Step 3: Detect Trends
Is the business growing or shrinking? We can answer this by comparing the moving average at the start of our data to the end. We can also use a simple linear regression approach — calculate the slope of sales over time.
Another useful technique is comparing the first and last 7-day windows. If the recent window is significantly higher than the early one, we have an upward trend.
Analyze the trend in the sales data:
1. Calculate the mean sales for the first 7 days and the last 7 days.
2. Calculate the percentage change: (last_week - first_week) / first_week * 100.
3. Use numpy's polyfit to fit a line to the sales: slope, intercept = np.polyfit(range(len(df)), df['sales'], 1).
4. Print "First week avg: $X.XX", "Last week avg: $X.XX", "Change: X.X%", and "Daily growth rate: $X.XX/day" (the slope).
5. Print "Trend: UPWARD" if slope > 0, else "Trend: DOWNWARD".
Step 4: Find Seasonality Patterns
Seasonality means repeating patterns at fixed intervals. In our data, we built in a weekend boost. But in real data, you'd need to discover these patterns. Grouping by day of week and by month reveals when sales peak and dip.
The dayofweek property gives 0 (Monday) through 6 (Sunday). Grouping sales by this value and taking the mean shows the average sales for each day of the week.
Analyze seasonality in the sales data:
1. Add a day_of_week column using the index's dayofweek property.
2. Add a day_name column using the index's day_name() method.
3. Calculate mean sales by day of week, sorted by the day_of_week number.
4. Find the best and worst days (by mean sales).
5. Print "=== WEEKLY PATTERN ===" then each day with its mean: "{day_name}: ${mean:.2f}".
6. Print "Best day: {name} (${mean:.2f})" and "Worst day: {name} (${mean:.2f})".
Step 5: Detect Anomalies
An anomaly is a data point that doesn't fit the expected pattern. Maybe a marketing campaign caused a sales spike, or a server outage caused a drop. Detecting anomalies automatically lets you investigate unusual events without manually scanning hundreds of data points.
The Z-score method is one of the simplest approaches. For each data point, calculate how many standard deviations it is from the mean. Points with a Z-score above 2 or below -2 are flagged as anomalies. This works well when data is roughly normally distributed.
Implement Z-score anomaly detection:
1. Calculate the mean and standard deviation of sales.
2. Add a z_score column: (sales - mean) / std.
3. Add an is_anomaly column: True if abs(z_score) > 2.
4. Print "=== ANOMALY DETECTION ===" as header.
5. Print the total number of anomalies found.
6. For each anomaly, print the date, sales value, and z-score: "{date}: ${sales:.2f} (z={z_score:.2f})".
7. If no anomalies at threshold 2, try threshold 1.5 and note the change.
Step 6: Generate a Complete Trend Report
A good analysis ends with a clear summary. Decision-makers don't want to see raw DataFrames — they want insights. Let's pull together everything we've built into a single comprehensive trend report.
Write a function generate_report(df) that takes the sales DataFrame and prints a complete analysis report:
1. "=== SALES TREND REPORT ===" header
2. Overview: date range, total days, total revenue, mean daily sales
3. Trend: first vs last week average, percentage change, direction (UPWARD/DOWNWARD)
4. Weekly Pattern: best and worst day of week
5. Anomalies: count of days with |z-score| > 1.5
6. A final "Recommendation:" line based on the trend direction
Call it with the generated DataFrame.
Project Complete!
You've built a complete time series analysis toolkit. Here's what you accomplished: