Sign in or start a free trial to avail of this feature.
1. Course Introduction
In this lesson, we learn about the concepts of data granularity and data aggregation. We also get a brief summary of what we cover in this course.
To explore more Kubicle data literacy subjects, please refer to our full library.
Data Granularity (00:28)
Before you can begin aggregating data, it’s important to understand the concept of granularity. Every subset of data has a degree of granularity. In essence, the granularity of the data is what each row of data represents. In our product table, each row represents an individual product.
However, through aggregation, we can change the granularity of our data. Our data will always start in its most granular, or most detailed form, but we can perform calculations to make the data less granular. This is called data aggregation.
For example, taking our products dataset, we could calculate the average price of all products. That would result in a single figure changing our data from its most granular form to its least granular form since we’re left with just one single value.
However, there are many layers of granularity between our full data and a single row of data. For example, we could calculate the average price of products for each manufacturer. Instead of a single value, we’d get a separate value for each manufacturer in the dataset. Since there are fewer manufacturers than actual products, the granularity of this table would be somewhere in between the most granular full dataset and least granular single value.
Course outline (03:06)
Throughout this course, we’ll learn how to aggregate our data to different degrees using different calculations, all in SQL code.
We’ll start by exploring different aggregate functions which are in effect, different calculations we can use to aggregate our data, such as sum, average, count, etc.
We’ll then look at the GROUP BY clause which can be used to control the level of granularity. For example, the GROUP BY clause would be used in our previous example where we calculated the average price per manufacturer.
Finally, we’ll explore methods for adding filters to aggregated data. This includes the use of standard filters using the WHERE clause as well as filters based on aggregated data using the HAVING clause.
In this course, we'll focus on the concepts of data granularity and data aggregation. We'll use the same bicycle store database we worked with in previous SQL courses. You can find a diagram of this database in the lesson files tab.
Before we begin to explore data aggregation, we first need to understand the concept of data granularity. The granularity of data is what each row of data represents.
For example, if we run a simple query to output all columns in the products and manufacturers tables in our bicycle store dataset, we can see that each row in the output represents a single product and each column represents separate characteristics of each product.
Meanwhile, if we run another simple query to output all the columns in the sales table, we can see that the granularity of the data is a single sales transaction.
One important feature of granularity is that it's rarely fixed. We can use data aggregation to change the granularity to a different level of detail.
Data aggregation refers to the use of calculations to change the granularity of the data.
For example, we could simplify all this data down to a single value that represents the average price of all products using data aggregation.
Before we calculated the average price, our data contained the most amount of information we had available.
This meant it had a very fine or low level of granularity.
After performing data aggregation, we modified it to have the highest possible level of granularity, which results in just a single value.
However, there are lots of degrees of granularity in between.
For example, we could break down the average price per manufacturer. This is known as grouping the aggregation.
We can group the data by one or more columns.
In this case, we grouped by the manufacturer column. We first separated the prices out into separate groups based on their manufacturer, and then calculated the average for each group.
The ability to change the granularity in your data can be very useful, particularly within SQL. If we're looking for average quarterly sales for a company that makes millions of transactions, it's far more efficient to make that aggregation in our SQL code than to output a dataset with millions of rows and then perform the aggregation in a separate piece of software. This is because the database can perform aggregate functions millions of rows much quicker than it can simply output them.
Throughout this course, we'll learn about a number of different methods for aggregating our data.
We'll start by exploring different types of aggregate functions, such as average, sum, and count. We'll also learn how to perform some basic calculations in SQL. We'll then learn how to use the group by clause to aggregate data according to categories.
Finally, we'll learn how to combine filters and aggregated data by using the having clause and subqueries.
We'll get started in the next lesson where we'll look at the most basic aggregate functions.