5. Variance and Standard Deviation

Overview

These two metrics quantify how far your data is dispersed from the mean - they can be calculated manually or through in-built Excel functions.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Calculating Variance Manually (00:04)

    The variance and the standard deviation are important metrics that we can use to measure how far our data is spread around the average. We can calculate the variance of a data set manually by following these steps:

    • Find the difference between each value and the mean of the data set.

    • Square each of these values. This ensures that the distance between each point and the mean is positive, and stops values below the mean canceling out values above the mean.

    • Find the average of these squared values.

  2. Calculating Standard Deviation Manually (02:19)

    It’s difficult to interpret variance intuitively. Because we square the deviations as part of the calculation, the variance is measured in square units. For example, the variance of revenues for the sports bar is measured in dollars squared. 

    We can solve this problem by computing the standard deviation, which is simply the square root of the variance. This is measured in the same units as the data set. A smaller standard deviation indicates the numbers are closer to the mean of the data set.

  3. Using Excel Functions (03:14)

    Excel provides inbuilt functions for calculating the variance and standard deviation. The function VAR.P can be used to find variance, and STDEV.P can be used to find standard deviation. Both these functions take an array of cells as their input, and return the variance or standard deviation of those values. 

    The P in both functions stands for population. When we’re performing descriptive statistics, we want to use the population versions of these two functions.

Transcript

In the previous lesson, we learned how to create a frequency distribution for our sports bar.

In this lesson, we'll learn how to calculate variance and standard deviation in Excel. Variance and standard deviations are two important metrics that quantify how far your data is dispersed from the mean. These two metrics can be calculated with built-in Excel functions, but in this lesson, we're going to first calculate them manually on a simple example to make sure we fully understand how they work.

The example in question has five different dollar amounts with an average of $6.

We'd like to calculate the variance and standard deviation of this dataset. The first step is to simply subtract the average value from each data point. We'll write equals $2, then subtract the average, which is at J3, and then anchor the cell with F4.

We'll then copy this formula for the remaining four cells.

This leaves us with a set of numbers, some positive and some negative.

Variance and standard deviation don't care if a number is above or below the average, they just care how far it is. Since the distance is always positive, we must find a way to convert the numbers from negative to positive so they don't cancel each other out later in the calculation.

To do this, we simply square each number.

So we'll write equals, select the first data point, write a caret, which on my keyboard is shift six, and put it to the power of two.

We'll then copy this for the remaining four cells.

Now this gives us a set of numbers that highlights the distance each point is away from the mean. The variance is simply the average of these numbers, so we'll add them together with a sum function and then divide them by five.

We'll write equals sum, select the cells, and then divide by five, which is the number of data points.

This gives us a variance of eight.

Now, unfortunately, variance isn't a very intuitive number, as the unit of variance in this case is dollars squared, because we squared the difference between the data points and the mean earlier in the equation.

A much more intuitive number is the standard deviation, which is simply the square root of the variance. We'll write equals SQRT, open parentheses, and find the variance. We'll then press enter, and this gives us a standard deviation of $2 and 83 cents.

This means that, on average, our data points are $2 and 83 cents away from the mean value.

I'll change these numbers off camera by moving the numbers closer together and keeping the mean the same. We can now see that the standard deviation decreases, because, on average, our numbers are closer to the mean.

In this way, standard deviation intuitively quantifies how far the data is dispersed from the mean or average value.

Thankfully, we don't have to complete these separate steps when we're calculating standard deviation and variance. Excel has some handy built in functions we'll use for the sports bar. For variance, we'll writes equals VAR select the P option, which is for the entire population.

We'll then simply select revenues as the input.

This gives us a rather unintuitive value of over 14 million dollars, so instead, we'll focus on standard deviation. We'll write equals STDEV, select the P option, because we have the entire population, and again, the input is revenues.

When we press enter, we find that, on average, the data is $3,762 away from the mean value of $5,738.

Standard deviation is a very important metric in statistics, and it's imperative you understand how it's calculated and how to interpret it. To help you with this task, I've created an exercise on the third tab that asks you to calculate the mean, median, and standard deviation for a rival sports bar called Solo. When you've calculated your results, try to compare Solo's performance against Virtuoso.

I'll leave the answer in the after file below the video.

Data Analysis
Introducing Statistics

Contents

My Notes

You can take notes as you view lessons.

Sign in or start a free trial to avail of this feature.

Free Trial

Download our training resources while you learn.

Sign in or start a free trial to avail of this feature.

Free Trial