Sign in or start a free trial to avail of this feature.
4. Methods of Aggregation
When we add values to a chart, Power BI automatically aggregates the data by performing a calculation on it. We’ll learn more about this process, and how to change the calculations used, in this lesson.
- When we add a field to a chart in Power BI, an aggregated version of the data is usually added to the chart
- Aggregation performs a calculation on a field
- These calculations can include Sum, Average, Min, Max and others
Count and Count (Distinct)
- Non-numeric fields, such as text data, are often aggregated using count or Count (Distinct)
- The count aggregation counts the number of times a value appears in the dataset
- Count (Distinct) ignores duplicated values, counting only the number of distinct values that appear
Up to now, we've assumed that Power BI knows what to do with each field in our visualizations. This is because we've been using the default methods of aggregation.
In this lesson, we'll learn exactly what that means and how we can change the way Power BI uses the various fields we put into our visuals.
We'll start by creating a bar chart of revenue by state.
This chart shows us the total revenue by each state. However, you may remember that our data did not contain any totals.
It only contained individual states by company. So you might ask, "why did Power BI sum all the revenue figures for each state?" The answer is that Power BI automatically aggregates data when you place it in the values well.
It automatically sums all numeric data, such as revenue, giving us a value representing total revenue. We can see this by clicking the arrow next to 'Revenue.' The tick next to 'Sum' indicates that Power BI is summing the field.
Note that there are several alternative methods for aggregation, including average, maximum, minimum, median, variance, standard deviation, and count.
For example, if we click 'Average,' we can now see the average revenue per company by state.
This clearly follows a different distribution from our previous graph. Let's consider the count aggregation method in more detail, focusing on a chart with non-numeric values.
Let's say we want to find out how many states each sales person is operating in.
To that end, we'll create a bar chart, drag 'Sales person' to the axis, and 'State' to 'Value.' Since state, our value, is a text field, Power BI automatically chooses the count aggregation.
Therefore, this graph should be showing the number of states that each sales person made a sale in.
Upon closer inspection, we can see that this isn't quite right. For example, Moran has made sales in 96 states, however, there are only 50 states in the USA. In reality, count simply finds the number of sales made by Moran where the state field has a value.
As such, if Moran makes multiple sales to the same state, the state is counted multiple times.
Effectively, this chart is simply counting the number of sales made by Moran.
To fix this, we'll click the arrow next to 'Count of State,' and change our aggregation to 'Count (Distinct).' 'Count (Distinct)' counts each state only once, regardless of the number of sales made in that state.
Our chart now shows more realistic numbers. For example, it now shows that Moran is making sales in 28 states.
It can be easy to mix up 'Count' and 'Count (Distinct),' particularly if you use 'Count' as an automatic aggregation created by Power BI.
As such, it makes sense to double-check your results. In this lesson, we've seen the various methods of aggregation available in Power BI. Familiarity with these options allows you to easily analyze data in multiple ways without any formulas or coding.
In the next lesson, we'll look at how we can use color saturation to further analyze our data.