Sign in or start a free trial to avail of this feature.
2. Optimizing Extracts
By altering the extracts that connect to your Tableau workbook, significant performance improvements can be made. Find out how in this lesson.
- Extracts typically provide a performance boost over live connections
- Unless your dataset is extremely large, try to always use extracts
- Reducing the size of your extracts is the quickest way in Tableau to boost performance
Reducing the size of an extract
- To reduce the size of an extract, you can hide the columns you don't use
- Extract filters can also be used to reduce the size of an extract
- Aggregate roll-ups (e.g. converting dates to monthly totals) also reduces the extract size
- Note: Aggregate roll-ups change the extract data and may affect your calculations
Now that we understand how to record our performance in Tableau, let's look at some optimization techniques while loading the data source into our Tableau workbook and I'll start by looking at extracts. Unless your dataset is incredibly large, you will normally get a performance boost by using extracts instead of live connections. In particular, extracts allow you to cherry-pick from a live data source only the data that's of interest to you thereby reducing the size of the data source that's loaded into Tableau.
And as a result, your loading times will improve and the files that you circulate to management or to your client afterward are smaller. Let's start by looking at some of the ways to reduce the size of our extract. In the current example, I have a number of columns shown from left to right that are either calculated or that exist within a dataset on a live connection. Let's now flip this to extract.
And one of the ways to improve the performance of an extract is to reduce the number of columns in the table. To do this, we can select columns manually or we can allow Tableau to help us. To select a column manually, I simply hit the dropdown and hit hide.
And once my dataset updates, you can see that this column is grayed out and hidden from view. If I'd like Tableau to help me find all the columns that are currently unused in my workbook, I can hit edit and hide all unused fields. When I press okay, not only is latitude now hidden, we also have longitude, quantity and a number of other columns as I scroll across.
If I want to include these columns again, I simply hit the dropdown and unhide.
Often a live connection will have a lot more data than we need. For example, it may have five years of data when all we need is six months. To apply a filter to a field such as date, we can hit edit, add and then select the column we'd like to filter.
Let's say we want to filter by years. You can hit next and then select the years that we'd like to filter.
I just want to see 2013 to 2016.
As you can see, we now start at 2013.
Our next option in reducing extract size is creating an aggregated extract. If I go back to edit, I'll see this option below filters.
And what aggregation does is actually change the data table itself to a more simplified version that suits your needs. For example, let's say I have point of sale data for a particular store that gives me time series data for every minute of the day that a transaction takes place. Say that all of the calculations I perform are rolled up to a month level. Well then reducing this dataset to just months in the date column will dramatically reduce the dataset size. To perform this monthly rollup, I can simply hit aggregate data and roll up dates to month.
Alternatively, I can roll up dates to quarter, to day or even to year.
When I perform this change and press okay, I get a warning to say that the data extract may change the data displayed in the workbook and the reason for this is that some of the calculations that we perform may not be possible once we've performed this rollup. If I hit okay and perform the rollup anyway, I can see that the date field has now changed and as I scroll down, I only have yearly data.
To remove this rollup, I can simply uncheck the boxes. If I simply check aggregate data for visible dimensions, this simply creates a table that aggregates typically a sum calculation for a dimension such as product name pending on whether it's visible or not. I don't like to use this aggregation option too much because it actually rules out an incremental refresh. For larger datasets in particular, it's important to do an incremental refresh when updating your dataset so that when a daily update is performed, you don't update the previous three years of data also. So when I click away from aggregate data, I get the option of an incremental refresh. When I hit the incremental refresh option, I need to select which column identifies new rows and this is typically the date column because refreshes often depend on date. My last option is to select the top end rows in my dataset.
I find that this option isn't used very often. As a result, I tend to avoid it. However, the incremental refresh, filters and hiding all unused fields are great ways of reducing the size of your extracts and making the performance of your dashboards that bit faster.