Build a Sales Performance Dashboard with KPI Analysis
Every business runs on sales data. Managers want to know which products are selling, which regions are performing, and whether things are getting better or worse. In this project, you'll build a complete sales analysis pipeline that answers these questions.
You'll start with raw transaction data and build up to a formatted report with KPIs, breakdowns, and growth metrics. This is the kind of work a data analyst does every week.
Step 1: Create the Sales Dataset
Our dataset represents three months of sales transactions for a small electronics company. Each row is a single sale with the date, product, salesperson, region, quantity, and unit price.
Notice that we immediately calculate a revenue column by multiplying quantity by unit price. This is a derived column that we'll use throughout the analysis.
Create a sales DataFrame from the provided data dictionary. Then:
1. Convert the date column to datetime using pd.to_datetime()
2. Add a revenue column (quantity * unit_price)
3. Add a month column that extracts the month name from the date (use .dt.strftime('%B') to get full month names like "January")
4. Print the total number of transactions and the columns list.
Step 2: Calculate Key Performance Indicators (KPIs)
KPIs are the numbers that tell you how the business is doing at a glance. For a sales team, the most important ones are total revenue, number of transactions, average order value, and total units sold. Let's calculate them all.
Write a function calculate_kpis(df) that takes a sales DataFrame (with a revenue and quantity column) and returns a dictionary with these keys:
total_revenue: sum of all revenuetransaction_count: number of rowsavg_order_value: mean revenue per transactiontotal_units: sum of all quantitiesavg_unit_price: total revenue divided by total unitsPrint each KPI formatted with 2 decimal places.
Step 3: Break Down Performance by Product and Region
Overall numbers are useful, but managers always want to drill down. Which product brings in the most revenue? Which region is underperforming? The groupby() method is how you answer these questions.
Write a function breakdown_by(df, column) that:
1. Groups the DataFrame by the given column
2. Calculates total revenue, total units sold, and transaction count per group
3. Sorts by total revenue descending
4. Returns the result DataFrame
The returned DataFrame should have columns: total_revenue, total_units, transactions.
Call it for both 'product' and 'region' and print the results.
Step 4: Find Top Performers
Sales managers love rankings. Who's the top salesperson? What's the single biggest sale? Let's write code that answers both.
Write a function find_top_performers(df) that prints:
1. The top salesperson by total revenue (print: Top salesperson: NAME ($REVENUE))
2. The top product by total units sold (print: Top product: NAME (UNITS units))
3. The largest single sale by revenue (print: Largest sale: $REVENUE by NAME)
Format all dollar amounts with 2 decimal places.
Step 5: Calculate Month-over-Month Growth
Trends matter more than snapshots. A $50,000 month sounds great, but if last month was $80,000, there's a problem. Month-over-month growth shows whether the business is improving or declining.
Write a function monthly_growth(df) that:
1. Groups the DataFrame by the month column
2. Calculates total revenue per month
3. Calculates the percentage change from one month to the next
4. Prints each month's revenue and growth rate
Format: MONTH: $REVENUE (GROWTH%) where growth shows +/- sign.
For the first month (no previous month), print: MONTH: $REVENUE
The months should appear in order: January, February, March.
Step 6: Create a Summary Report Function
Time to combine everything into a single report function. A good report function runs all the analysis and returns organized results. This makes it easy to generate reports for different time periods or datasets.
Write a function create_report(df) that returns a dictionary with:
kpis: dict with total_revenue, transaction_count, avg_order_valuetop_product: name of the product with highest total revenuetop_salesperson: name of the salesperson with highest total revenuebest_region: name of the region with highest total revenueThen print the report in a readable format.
Step 7: Generate a Formatted Report
Data analysis is only useful if people can read it. The final step is turning raw numbers into a nicely formatted text report. Good formatting uses alignment, separators, and clear labels to make the data scannable.
Write a function format_report(df) that prints a complete formatted sales report. The report should include:
1. A header with the title "SALES PERFORMANCE REPORT"
2. KPI section showing total revenue, transaction count, and average order value
3. Top performers section showing the top product and top salesperson
4. Use = and - characters as separators
Match this exact format:
================================
SALES PERFORMANCE REPORT
================================
Total Revenue: $8849.86
Transactions: 5
Avg Order Value: $1769.97
--------------------------------
Top Product: Phone
Top Salesperson: Bob
================================What You Built
| You built a complete sales analysis pipeline that transforms raw transaction data into actionable insights: | ||
|---|---|---|
| --- | --- | --- |
| Data prep | Convert types, add derived columns | pd.to_datetime(), arithmetic |
| KPIs | Calculate summary metrics | .sum(), .mean(), len() |
| Breakdowns | Group by category | .groupby().agg() |
| Rankings | Find top performers | .idxmax() |
| Growth | Track trends over time | .pct_change() |
| Reporting | Present results clearly | f-string formatting |
This is the exact workflow data analysts follow in real companies. The dataset gets bigger and the analysis gets more sophisticated, but the pattern stays the same: load, prepare, analyze, report.