Data Visualization: Choose the Right Chart and Tell a Story
A chart that looks pretty but tells the wrong story is worse than no chart at all. In 1986, NASA engineers showed managers a chart arguing that cold temperatures could damage the Space Shuttle Challenger's O-ring seals. The chart was so poorly designed that the managers couldn't see the pattern. The shuttle launched the next day and broke apart 73 seconds after liftoff.
Good data visualization isn't just about making things look nice. It's about choosing the right chart type, avoiding misleading scales, making data accessible to everyone, and ultimately telling a clear story. This tutorial teaches you the principles that separate effective visualizations from confusing ones.
Which Chart Type Should You Use?
The most common mistake in data visualization is picking the wrong chart type. A pie chart for 20 categories? Unreadable. A line chart for categorical data? Misleading. The right chart depends on what question you're answering.
Here's a decision framework:
What Makes a Chart Misleading?
A chart can be technically accurate but visually deceptive. News outlets, advertisers, and even well-meaning analysts create misleading charts all the time. Learning to spot these tricks makes you a better data consumer and a more honest data presenter.
The Truncated Y-Axis Trick
The most common trick is starting the y-axis at a number other than zero. This magnifies small differences and makes them look enormous. A 2% change can look like a 200% change.
Cherry-Picking Time Ranges
Another deceptive technique is choosing a time range that supports your narrative while hiding the bigger picture. A stock that's up 5% this month might be down 40% this year.
How Do You Make Charts Accessible to Everyone?
About 8% of men and 0.5% of women have some form of color vision deficiency. If your chart relies solely on color to distinguish data series, these readers can't understand it. Accessibility isn't just nice to have — in many organizations, it's a legal requirement.
Here are the key principles for accessible visualizations:
How Do You Tell a Story with Data?
The difference between a chart and a data story is context. A chart shows numbers. A story explains what those numbers mean, why they matter, and what should happen next.
Every good data story has three parts:
What Are the Most Common Visualization Mistakes?
Even experienced data scientists make these mistakes. Here's a checklist of things to watch for before sharing any visualization.
Practice Exercises
Write a function choose_chart(data_description) that takes a string describing the data and returns the best chart type. Handle these cases: if the description contains "over time" or "trend", return 'line'. If it contains "compare" or "categories", return 'bar'. If it contains "relationship" or "correlation", return 'scatter'. If it contains "distribution", return 'histogram'. If it contains "proportion" or "percentage", return 'bar' (not pie -- bar is almost always better). Otherwise return 'bar' as the safe default. Test with the provided descriptions.
Write a function find_issues(config) that takes a dictionary with chart configuration and returns a list of issue strings. Check for these problems: (1) If y_min is not 0 and chart_type is 'bar', add 'truncated axis'. (2) If categories > 6 and chart_type is 'pie', add 'too many pie slices'. (3) If is_3d is True, add 'unnecessary 3d'. (4) If has_title is False, add 'missing title'. Print the sorted issues for the test config.
Write a function make_accessible(series_count) that returns a list of dictionaries, each with keys 'color', 'marker', and 'linestyle' for making chart lines distinguishable without relying on color alone. Use these colors in order: '#0072B2', '#E69F00', '#009E73', '#D55E00'. Use these markers: 'o', 's', '^', 'D'. Use these linestyles: '-', '--', '-.', ':'. Return configs for the first series_count series. Print each config for series_count=3.
What will this code print? It calculates how much a truncated y-axis exaggerates the visual difference between two bars.
Write a function build_narrative(data, events) where data is a dict of {month: value} and events is a dict of {month: event_description}. The function should return a list of strings, one per event, in the format 'MONTH: VALUE - EVENT'. Only include months that appear in both dicts. Print each narrative line for the test data.