Sign in or start a free trial to avail of this feature.
Correlation measures how two variables move in relation to each other, a useful metric for finding out what's driving revenue and profits.
Correlation Examples (00:04)
Correlation is a statistic that aims to measure the strength of the relationship between two variables. It takes two columns of data, and returns a correlation coefficient, which has a value between 1 and -1.
A correlation of 1 is perfect positive correlation. In this case, if one variable increases, the other increases at the exact same rate. A correlation of -1 is perfect negative correlation. In this case, if one variable increases, the other decreases in the same proportion. A correlation of 0 indicates there is no relationship at all between the two variables.
The size of the correlation coefficient indicates the strength of the relationship, as follows:
0 to 0.3 (or 0 to -0.3) is a weak correlation
0.3 to 0.7 (or -0.3 to -0.7) is a moderate correlation
0.7 to 1 (or -0.7 to -1) is a strong correlation
Calculating Correlations (01:44)
In Excel, we calculate correlation using the CORREL function. This takes two data ranges, and returns the correlation coefficient between their values.
In our sports bar dataset, we calculate the correlation between daily revenue and the number of daily transactions, and between daily revenue and the daily temperature. We expect both of these correlations to be positive.
We find the correlation between revenue and transactions is 0.98, which is very strong. The correlation between revenue and temperature is 0.25, which is weak, and suggests the bar is not generating revenue from its beer garden.
Multiple Correlations (03:39)
Using the Analysis Toolpak, we can calculate multiple correlations at once. From the Data Analysis window, we select correlation, and select a range of cells containing multiple variables. The tool produces a matrix which calculates the correlation between each pair of variables in the range.
Possible Issues (04:43)
The CORREL function has some important constraints you should be aware of. If you’re measuring the correlation between two columns of data, both columns need to be continuous variables, meaning they should be numeric, and they can be measured on a continuum. For example, you couldn’t use CORREL to measure the correlation between revenue and days of the week, as days are not numeric.
There are other correlation functions that can be used for different types of data. You can learn more about these functions in the resources below.
http://hosted.jalt.org/test/bro_12.htm - Learn about different types of correlation coefficient
https://statistics.laerd.com/statistical-guides/types-of-variable.php - Learn about different variable types that help you decide which correlation coefficient to use
In the previous lesson, we saw that there appeared to be a relationship between the day of the week and our daily revenue. In truth, there's often a number of variables that have relationships with the revenue, and it will be great to somehow quantify the strength of these relationships. Step forward correlation. The correlation formula accepts two columns of data and generates a correlation coefficient with the value between one and minus one. A correlation of one, shown on the example on the left, is known as perfect positive correlation. Whenever one value increases, the other increases at the exact same rate. For a correlation of minus one, one value goes up when another goes down, and in exactly the same proportion. In real life, you're very unlikely to find a correlation of one or minus one, but it's useful to know what the extremes look like. The final chart on the right shows a correlation of zero, where there is absolutely no relationship between the two variables. In this lesson, we're going to check two correlations for Virtuoso Sports Bar. First we're going to check if a correlation exists between the revenue and the total number of transactions, and then we're going to check if a correlation exists between daily revenue and the external temperature. As a rule of thumb, correlations between zero and plus or minus 0.3 are considered weak, between 0.3 and 0.7, moderate, and correlations of 0.7 and above are considered strong. Let's start off by finding the correlation between the number of transactions on each day and the total daily revenue. Our relationship would reasonably expect to have a strong correlation. I'll start off by creating a named range with control F3 and alt N, and this will be called transactions, and in the refers to box, I'll simply select my column.
I'll then escape, and go to my output cell, and to calculate a correlation, we'll write equals correl, open a bracket, and the first array will be revenues, and the second array will be transactions.
I'll close the bracket and press enter, and this gives me a correlation 0.98, which we can see is a very strong correlation. Let's now see if a correlation exists between revenue and daily temperature. The sports bar has quite a big beer garden, so we might expect that on warmer days, we get more customers, and hence, more revenues. Let's start by creating a named range for temperature, so I'll press control F3, alt N for a new name, and I'll call it temperature.
In the refers to box, I'll simply select my temperature column.
Then press OK and escape.
I'll now go to my output cell and write equals correl, revenues and then temperature.
Close the bracket and press enter, and here we find that the correlation between temperature and revenue is actually quite weak. This means that the company might not be getting any additional customers on hotter days, which is what the beer garden is actually for. Given this result, the company might want to consider using the beer garden space for a different purpose.
Here we calculated two correlations in two separate steps, but the analysis tool pack allows us to perform multiple correlations at the same time. Let's go to the data tab in the ribbon and click on data analysis.
And we'll select correlation and press OK.
We'll then scroll up to the top of the page and select our input range, and we'll include the labels in the first row, and the output will be in a new worksheet. I'll press OK to complete. If I adjust the column widths very quickly with Alt+O+C+A, you can see the correlations between the three variables. First we have the correlation between revenue and revenue, which is obviously one.
Here is the correlation between revenue and transactions, which we saw earlier was 0.98, and here is the correlation between temperature and revenue, and temperature and the number of transactions. When you need to complete multiple correlations on a single data set, you can see how the analysis tool pack can save you quite a lot of time. Overall, the correlation function in Excel is quite useful when quantifying the relationship between two variables; however, it does come with a lot of limitations, and if you're not aware of these, it's very easy to make mistakes. The primary constraint regards the type of data the correlation function can accept. There are different types of correlation coefficients for different types of data, and the correl function represents only one of these. Unless both data columns are continuous variables, IE, they are numerical, and they can be measured along a continuum, then you shouldn't use the correl function, so for example, we couldn't use the correl function when finding the correlation between the day of the week and revenue. We'd need to calculate a different correlation coefficient for this data. In the show notes, I'll link to some articles that discuss the other types of correlation coefficients, and I'll show you how to calculate some of these in a later lesson.