5. Variance and Standard Deviation

 
Subtitles Enabled

Sign up for a free trial to continue watching this lesson.

Free trial

Overview

These two metrics quantify how far your data is dispersed from the mean - they can be calculated manually or through in-built Excel functions.

Lesson Notes

Variance and standard deviation

- These two metrics measure how far our data deviates from the mean (or average)
- Standard deviation is the square root of the variance
- Standard deviation is a more intuitive metric because its unit is the same as the mean

Useful functions

=VAR.P: Calculate the variance of a population (for descriptive statistics)
=STDEV.P: Calculate the standard deviation of a population (for descriptive statistics)
=VAR.S: Calculate the variance of a sample (for inferential statistics)
=STDEV.S: Calculate the standard deviation of a sample (for inferential statistics)

Keyboard shortcuts

CTRL + →: Move to end of data region
CTRL + SHIFT + →: Select all cells within data region
SHIFT + →: Select adjacent cell
ALT + E, S, F: Paste formulas
ALT + E, S, T: Paste formats

Formula for calculating standard deviation (for descriptive statistics)

Transcript

Variance and standard deviation are two important metrics that quantify how far your data is dispersed from the mean. These two metrics can be calculated with in-built Excel functions, but in this lesson I'm going to first calculate them manually on a simple example to make sure you fully understand how they work. The example in question has five different dollar amounts with an average of $6.

I'd like to calculate the variance and standard deviation of this data set. The first step is to simply subtract the average value from each data point, so I'll write equals $2 and then subtract the average, which is at J3, and then anchor this cell with F4 and then copy this formula for the remaining four cells.

And this leaves me with a set of numbers, some positive and some negative. Variance and standard deviation don't care if a number is above or below the average, they just care how far. Since distance is always positive, we must find a way to convert the negative numbers to positive, so that they don't cancel each other out later on in the calculation. To do this, we simply square each number, so I write equals, select the first data point, write a caret, which on my keyboard is Shift + 6, and put it to the power of two. I'll then copy this for the remaining four cells.

This now gives me a set of numbers that highlight the distance each point is, away from the mean. The variance is simply the average of these numbers, so I'll add them together using the SUM function and then divide them by five. So I'll write equals SUM, select the cells, and then divide by five, which is the number of data points.

And this gives me a variance of eight. Now, unfortunately, variance isn't a very intuitive number, because the unit of variance in this case is dollar squared, because we squared the difference between the data points and the mean, earlier in the equation. A much more intuitive number is the standard deviation which is simply the square root of the variance. I'll write equals SQRT, open a bracket, and find the variance. I'll then press Enter and this gives me a standard deviation of $2.83. This means that on average our data points are $2.83 away from the mean value. If I now change these numbers off camera, moving the numbers closer together and keeping the mean the same, you can see that the standard deviation decreases, because on average our numbers are closer to the mean. In this way, standard deviation quantifies intuitively how far your data is dispersed from the mean, or average value. Thankfully, we don't have to complete these separate steps when we're calculating standard deviation and variance. Excel has some handy in-built functions, we'll use for the sports bar. So for variance, I'll write equals VAR and select the P option, which is for the entire population. I'll then simply select revenues as my input and this give me a rather unintuitive value of over 14 million dollars, so instead we'll focus on standard deviation. I'll write equals STDEV, select the P option, because I have the entire population, and again my input is revenues.

When I press Enter, I find that on average my data is $3,762 away from the mean value of 5,738.

Standard deviation is a very important metric in statistics and it's imperative that you understand how it's calculated and how to interpret it. To help you in this task, I've created an exercise on the third tab that asks you to calculate the mean, median and standard deviation for a rival sports bar called Solo. When you've calculated your results, try to compare Solo's performance against Virtuoso. I'll leave my answer in the after file below the video.