3. Interpreting Percentiles
Percentiles can quickly tell us how our data is distributed but interpreting them can be tricky at the start.
- The distance between percentiles and the median can tell you how your data is distributed
- To visually see this, sort the data by size and chart the data
- Percentiles can be considered 'good', 'bad' or 'indifferent' depending on the situation
=PERCENTILE.EXC(array,k): Calculates the kth percentile for an array of data
=PERCENTRANK.EXC(array,x): Calculates the percentile rank of a single value (x)
and keyboard shortcuts
ALT + A, S, S: Open Sort dialog box
CTRL + →: Move to end of data region
CTRL + SHIFT + →: Select all cells within data region
In my course on formulas and finance functions I showed you how to calculate percentiles and quartiles.
In this lesson, I’m going to spend more time on how to interpret the results you get from percentile calculations.
Let's start by calculating the percentiles for the following five levels.
I'll type “=percentile”, I'll select the .EXC option, the array will be revenues, and the K value will simply be 10%.
This gives me a result of $2,282 and tells me that any revenue day below this figure is in the bottom 10% of my full data set.
To find the remaining percentiles in the list, I'll just copy this formula and paste with Alt + E S F.
Let's now take a closer look at these five results.
The first thing that I notice is that the 10% and the 25% are far closer to the median than the 75% and 90%.
This leads me to believe that a lot of data is tightly bunched in the $2,000 to $5,000 range.
We can officially check this by sorting the data by size and then plotting the results.
Let's quickly do this by selecting our data set, and then press Alt + A S S to bring up the Sort dialog box.
I'll then sort by revenues, press OK and this changes the order of our revenue data.
I'll now select the revenue data and then insert column chart.
And then we'll move this chart up to the top of the page by cutting with Ctrl + X and then pasting with Ctrl + V.
Let's now scroll across so we can see both the chart and the output panel.
And as expected, our conclusion from looking at the percentiles was correct.
A large concentration of data exists between the $2,000 and $5,000 mark, with a smaller number of big revenue days dragging the average up to $5,738.
Although the difference between the median and the average pointed to this fact, percentiles offer a more conclusive and insightful look at how our data is distributed.
It also gives you some values to sense check when you are interpreting charts like this.
Before you begin to complete any detailed data analysis, it’s always worth quickly calculating the average median and a few percentiles, just to get a quick sense of the data distribution.
A calculated percentile can be considered a good or a bad result, but only in the context of the situation to which the data applies.
In some situations, a low percentile will be considered good, and in other contexts, a high percentile might be considered good.
In many situations, there is no value judgment that applies.
The percentiles simply inform you on how data is distributed, and not if this is a good or a bad figure.
Let's wrap up this lesson with a reversal of the percentile calculation.
That is choosing a single data point and finding its percentage rank.
Say you are browsing through some data and you find an interesting number you’d like to check.
Let's say B16.
So I'll write “= %rank”, select the .EXC option, the array will be revenues and X will be B16.
I'll then close the bracket and press Enter.
And this tells me that the B16 value is the 16th percentile.
This is a very useful command when you want to sense check one data point within a large data set.
Percentiles are just one way of understanding how data is distributed.
In the next lesson, I'll show you a different way using a frequency distribution.