11. Misleading People with Data

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

This lesson will discuss how data can be misused, either intentionally or accidentally, and how to avoid and deal with this misuse. The objective is to prevent viewers from being misled themselves.

Summary

Lesson Goal

The goal of this lesson is to see how poor visualizations can mislead people.

Altered Axes

The numeric axis on a bar chart or column chart usually starts at zero. By changing this, you can make the size differences between bars or columns look larger than they actually are. For bar and column charts, the best practice is to start the axis at zero.

This principle does not apply to line charts, where the focus is on showing a trend over time, instead of the size of the line. With a line chart, starting the numeric axis at zero can have the effect of making trends look much smaller than they actually are. With line charts, it is therefore acceptable to start the numeric axis above zero.

Irregular Intervals

On a time-based chart, irregular gaps between columns or line points can distort the appearance of a trend over time. This can happen where data is not collected at consistent intervals. Charts like this can mislead because we don’t know what happened at the time data was not collected.

Labeling Charts

Although charts are mostly visual, labels and text can add context and meaning to a chart. Leaving these out means that users only get a general, vague overview of the chart’s message. Therefore, you should include labels in charts if you want to provide a detailed message to readers.

Defying Conventions

People have inbuilt assumptions about charts that you should not defy. In the lesson, we see a line chart with an inverted y-axis. On this chart, a rise in the line means the series is falling. Defying conventions like this can seriously mislead charts readers.

Transcript

Data is a powerful tool, that is capable of doing a lot of good for many businesses.

However, if used inappropriately, data has the potential to mislead, which can have negative consequences for individuals or companies. In this lesson, we'll learn how poor visualizations can mislead people, by looking at some examples and characteristics of bad charts.

We'll see charts with altered axes, missing values, a lack of detail, and broken conventions.

One of the more common ways to mislead people, is to alter the axes of a chart.

For example, here we can see the revenue figures for a company with three fee earners. At a glance, it seems like Lenny is way behind the other two fee owners. However, if you look at the numeric axis, you'll see that the lowest value is $90,000. This means that each fee earner is generating $90,000 of revenue that's been ignored by this chart.

When we start the Y axis at zero, and include that revenue, the chart looks very different. We can see the gap between the three fee earners is actually not that large. You should always start a numeric axis at zero, when drawing a bar or column chart. By contrast, line charts don't have to start their axes at zero. This is because line charts are designed to show trends over time, and absolute amounts at any point in time are less important.

Consider this line chart showing a company's revenue over the course of a year. Revenue seems to drop fairly significantly towards the end of the year. When we force the Y axis to start at zero, the chart changes. The drop in revenue is still noticeable, but the line is flatter overall, so the drop doesn't seem so bad. This chart overemphasizes the amount of revenue each month, when a line chart should be focusing on the general trend. In the case of a line chart, an axis starting at zero without any data near zero, could be an attempt to mislead, by disguising a change that the creator of the graph doesn't want you to see. Another thing to watch out for on line charts, is irregular intervals on the X axis.

On a time-based chart, this can distort the appearance of trends or other changes. For example, here we see a chart that tracks internet usage over time.

Initially, you'd probably assume that the data is annual. But if you look closely, you can see that there is no data for 1999 or 2002.

As a result, the time intervals are uneven, which means we can't be certain that the trend shown is accurate. For all we know, there might have been declines in 1999 and 2002 that we're not aware of.

If visualizing data like this, you should use a different type of chart.

Next, let's consider the impact of labeling charts.

It's important to remember, text and annotations are vital components of a good chart.

Here we can see a chart showing the revenue generated by salespeople across four different regions.

This chart is fine if we just want a general idea of what regions and salespeople are responsible for the most revenue, but it doesn't provide much detail. Now consider the same chart, but with data labels added.

With these specific numbers, it becomes much easier to compare different regions and salespeople.

While you could argue that the initial version of the chart isn't misleading, it does conceal and obscure information that could be relevant to the audience, and you should avoid it for that reason. As a final example, let's consider this chart showing revenue trends over the course of a year. Initially, it looks as if revenue has risen towards the end of the year. However, notice that the vertical axis has been reversed. This means that revenue is actually falling towards the end of the year, even though the line is rising.

In this case, the chart is completely misleading, and portraying exactly the opposite message of what most people would think at a glance. We've now seen just a few examples of how visualizations can be confusing or misleading.

While data can be a powerful tool, you shouldn't always assume that a message communicated through data, is what it seems at first glance. In general, you should be able to cast a critical eye on charts or data. Misleading charts is one of the common ways in which people can misunderstand a piece of data analysis.

Possibly the most common, is confusing correlation and causality.

We'll discuss this topic in the next and final lesson of this course.