Sign in or start a free trial to avail of this feature.
2. Aggregate Functions in Tableau
Aggregate functions such as SUM, MEDIAN and AVG are the most common functions used in Tableau. In this lesson, we'll learn how to use these functions in some simple but effective formulas.
To explore more Kubicle data literacy subjects, please refer to our full library.
Popular aggregate functions
--- SUM(): Add all the values in a field
--- MEDIAN(): Find the middle value in a field
--- AVG(): Find the average of all values in a field
--- MAX(): Find the largest of all values in a field
--- MIN(): Find the smallest of all values in a field
--- COUNT(): Count the number of values in a field
--- COUNTD(): Count the number of distinct values in a field
Off camera I have loaded our dataset into Tableau and as you can see, we have 7 columns of data.
The date, product name, latitude, longitude which are geographic elements sales, quantity and the customer ID.
With raw data such as this, we can't really do much analysis without performing some calculations.
And to perform calculations we use functions such as SUM, MAX, MIN, MEDIAN, etcetera.
In Tableau, the most common functions are called aggregate functions and we use them all the time.
Let's now go to a sheet, and explore the most popular aggregate functions.
When I drag sales into the columns shelf Tableau will actually perform a default aggregate function of sum.
The syntax for aggregate functions is pretty straight forward and similar to Excel.
When I double-click on this particular pill I can actually see the formula.
So, the aggregate function wraps a particular field within brackets.
And the field name is always shown within square brackets.
If we wanted to write this formula, we can type sum and an autocomplete option appears I try to always use the autocomplete option as it helps with creating brackets, so I'll press Tab to autocomplete.
If I then want to add the sales field, I'll types sales again, the autocomplete appears hit tab to ensure that the brackets are used correctly.
Then I'll click away to perform this calculation.
The most common aggregate functions for analysis are obtainable using the dropdown button When I go to measure, I can see options such as average, median, count, count (distinct), min, max percentile, standard deviation, and variance.
Let's now take a look at these in some more detail.
But before I do this, I am going to add a dimension to the rows shelf which is product name.
And I'll sort by sales.
Let's change sum to average. So I'll hit the drop down go to measure and select average.
And again, when I sort, I get my products listed from largest to smallest.
When we think about average we need to understand what metric average is being applied to In our dataset, each row corresponds to the monthly sales of a product for an individual customer.
So, in the case of Byresta, this data point means that the average monthly sales per customer is 36.8 whereas for Vitalume, it's only 5.73 When you are using aggregate functions such as average and that are not sum, you need to understand what data the function is returning.
This goes for median, max, min and count as well.
Let's now compare median, to the average So, let's drag sales into columns as well And I'll flip sum to median.
And interestingly, median doesn't necessarily follow average.
So, for a product like Carmacare, we actually have quite a high average but quite a low median which means Carmacare more than likely has a couple of customers that it is very reliant on that are dragging the average well above the median.
We could also compare average to max.
So, I just simply change to max.
In comparison to max, min, median average and sum count is a little different.
Because it's counting the number of sales rather than using the sales values within each row.
Let's count the number of customers that each product has with the count function.
And this tells me that Byresta, actually has 7168 customers.
Or does it? When you use count, what you are actually counting is each row that Byresta is included on.
And each row is the monthly sales to an individual customer.
So, what this figure is actually telling me is that we have 7168 monthly sales to different customers.
If I wanted to calculate the number of customers I need to switch this to count (distinct) With count (distinct) now in place we might assume that we actually have the correct answer for the number of customers that each product is selling to.
But in actual fact, I know that in the data set we only have 122 customers in total.
So that these values, for example Mellosa and Sycalone are actually incorrect.
And this is a danger with using some of Tableau's aggregate functions.
Because so much of what happens is behind the scenes it's very easy to make mistakes such as this by selecting some preformatted aggregate functions.
I'll show you how to create a distinct count of the number of customers in a later lesson.
In addition to the aggregate functions available in the dropdown we also have quick table calculations that allow us to create some common formulas such as percent difference, percent of total rank and moving average.
In the next lesson, we are going to use some of these quick table calculations in some simple charts.