Correlation measures how two variables move in relation to each other, a useful metric for finding out what's driving revenue and profits.
- Correlation is a statistical measure of how two variables move in relation to each other
- The correlation coefficent is a number between -1 and +1 that indicates the strength of a correlation
--- -1 Perfect negative correlation
--- +1 Perfect positive correlation
--- 0 to +/- 0.3 is a weak correlation
--- +/- 0.3 to +/- 0.7 is a moderate correlation
--- +/- 0.7 to +/-1 is a strong correlation
=CORREL() Calculate the correlation between two continuous variables
Note: This formula can only be used for continuous variables. See this link to learn more about the different correlation coefficients that are used for other variable types.
CTRL + F3: Open name manager
ALT + N: Create new name
CTRL + →: Move to end of data region
CTRL + SHIFT + →: Select all cells within data region
SHIFT + →: Select adjacent cell
ALT + E, S, F: Paste formulas
ALT + O, C, A: Autofit column width
CTRL + Page Up/Page Down: Move to next sheet
In the previous lesson, we saw that there appeared to be a relationship between the day of the week and our daily revenue.
In truth, there's often a number of variables that have relationships with the revenue, and it would be great to somehow quantify the strength of these relationships.
Step forward correlation.
The correlation formula accepts two columns of data, and generates a correlation coefficient with a value between one and minus one.
A correlation of one, shown in the example on the left, is known as perfect positive correlation.
Whenever one value increases, the other increases at the exact same rate.
For a correlation of minus one, one value goes up when another goes down, and in exactly the same proportion.
In real life, you're very unlikely to find a correlation of one or minus one, but it’s useful to know what the extremes look like.
The final chart on the right shows a correlation of zero, where there is absolutely no relationship between the two variables.
In this lesson, we’re going to check two correlations for Virtuoso Sports Bar.
First we’re going to check if a correlation exists between the revenue and the total number of transactions, and then we're going to check if a correlation exists between daily revenue and the external temperature.
As a rule of thumb, correlations between 0 and plus or minus 0.3 are considered weak.
Between 0.3 and 0.7, moderate, and correlations of 0.7 and above are considered strong.
Let's start off by finding the correlation between the number of transactions on each day and the total daily revenue.
Our relationship, we’d reasonably expect to have a strong correlation.
I'll start off by creating a named range with Ctrl + F3 and Alt + N, and this will be called Transactions.
And in the Refers to box I'll simply select my column.
I'll then Escape and go to my output cell.
And to calculate a correlation we'll write “=correl”, open a bracket, and the first array will be revenues.
And the second array will be transactions.
I'll close the bracket and press Enter.
And this gives me a correlation of 0.98, which we can see is a very strong correlation.
Let's now see if a correlation exists between revenue and daily temperature.
The Sports Bar has quite a big beer garden, so we might expect that on warmer days we get more customers and hence more revenues.
Let's start by creating a named range for temperature.
So I'll press Ctrl + F3, Alt + N for a new name, and I'll call it Temperature.
In the “Refers to” box, I'll simply select my temperature column.
Then press OK and Escape.
I'll now go to my output cell and write =correl, revenues and then temperature.
Close the bracket and press Enter.
And here we find that the correlation between temperature and revenue is actually quite weak.
This means that the company might not be getting any additional customers on hotter days, which is what the beer garden is actually for.
Given this result, the company might want to consider using the beer garden space for a different purpose.
Here we calculated two correlations in two separate steps.
But the analysis tool pack allows us to perform multiple correlations at the same time.
Let's go to the Data tab in the ribbon and click on Data Analysis.
And we'll select Correlation and press OK.
We'll then scroll up to the top of the page and select our input range.
And we'll include the labels in the first row, and the output will be in a new worksheet.
I'll press OK to complete.
If I adjust the column widths very quickly with Alt + O C A, you can see the correlations between the three variables.
First we have the correlation between revenue and revenue, which is obviously 1.
Here is the correlation between revenue and transactions, which we saw earlier was 0.98.
And here is the correlation between temperature and revenue, and temperature and the number of transactions.
When you need to complete multiple correlations on a single data set, you can see how the Analysis ToolPak can save you quite a lot of time.
Overall, the correlation function in Excel is quite useful when quantifying the relationship between two variables.
However, it does come with a lot of limitations, and if you’re not aware of these, it’s very easy to make mistakes.
The primary constraint regards the type of data the correlation function can accept.
There are different types of correlation coefficients for different types of data, and the CORREL function represents only one of these.
Unless both data columns are continuous variables, i.e. they are numerical and they can be measured along a continuum, then you shouldn't use the CORREL function.
So for example, we couldn't use the CORREL function when finding the correlation between the day of the week and revenue.
We need to calculate a different correlation coefficient for this data.
In the show notes, I'll link to some articles that discuss other types of correlation coefficients, and I'll show you how to calculate some of these in a later lesson.