Sign in or start a free trial to avail of this feature.
1. Adding Trend Lines to Your Data
For some data sets, we may want to create a model to establish relationships between different variables. In this lesson, we'll go through how to do this with Tableau's trend lines feature.
- Trend lines can establish relationships between different variables in your dataset
- Trend lines tend to work best with x-y scatter plots - The strength of the relationship is determined by the r-squared value
- The closer r-squared is to 1, the stronger the relationship
4 trend lines available in Tableau
- Linear - Logarithmic - Exponential - Polynomial
In previous lessons we have used the Analytics tab to generate average lines, constant lines and even distribution bands, however, we haven't looked at trying to create a model from the data using this part of the tab.
We'll begin our analysis by looking at trend lines. Trend lines are used to see if we can establish a relationship between the variables shown on the chart, in this case between sales and quantity.
The relationship between the variables can be of four different types. When I drag the trend line, I can see there may be a linear, logarithmic, exponential or polynomial relationship.
Let's start off with linear. When I drag the linear trend line onto the chart, it creates a large gray line that represents the best fit given all of the data points included. The quality of the fit is determined by a value called R squared which I can see when I hover over the trend line. The closer R squared it to one, the more accurate my trend line is at predicting sales given quantity using the equation shown on screen. In this example my R squared is 0.896 which is quite high and so, in a linear model the total quantity sold is a reasonably good estimate for predicting sales.
With this model I can now take a quantity and then using this equation calculate sales with a reasonable level of predictability. Let's now try another model, say logarithmic.
So, I'll drag my trend off the line and then put in logarithmic. When I now hover over logarithmic, I can see that R squared is much lower than for linear and so, the linear model is a better option than logarithmic and for my data represents a better model for the relationship between quantity and sales.
So, let's go back to linear. Around the trend line you might have noticed two adjacent lines which represent a 95% confidence band.
The only time you'll need to use the confidence bands is when your data represents a sample and is not the total data population. Say that this data was a sample of a much larger dataset.
The confidence band indicates a 95% level of confidence that the best fit trend line for the total population exists within this entry. Obviously the wider apart these lines, the less likely your calculation from the sample represents the best trend line for the total population. When your dataset represents the total population, you can simply remove the lines.
We're going to edit and show confidence bands removed and press OK.
And this now leaves you with your trend line. Sometimes you'll want to include multiple trend lines on a particular chart. To do this, first add a color code.
And in this example I'm going to color code by location and in particular town.
Unfortunately there are simply too many towns in my dataset for this to make any sense, so let's remove town and instead I'll go with county.
And with county I now have five different trend lines representing my data. Unfortunately three of these counties have very few data points, so you can see that they simply have horizontal lines in the chart.
However, if I removed these lines from the chart by right clicking on the legend and excluding, I can now see I have two different trend lines for the two different counties and so, when I'm estimating sales from quantity, my equation for Surrey is different to my equation for Cornwall. And using this analysis we can see different relationships emerging for different variables. In the next lesson, we'll go back to the Analytics tab and explore clustering and forecasting.