Sign in or start a free trial to avail of this feature.
6. Possible Issues with Data Blending
Data blending can create some tricky problems for users, especially beginners. In this lesson, we explore some of these problems that can occur and explain why joins are often a better option.
Data blending problems - blank cells
- All values in the shared column of the primary data source need a match in the secondary data source
- If this is not the case, blank values can appear in the view
- Unfortunately these blank (or null) values are often not known to users and can cause errors
Data blending problems - asterisks
- In the secondary data source, the shared column needs a distinct match
- If this is not the case, cells will be filled by '*' values, due the ATTR() function
- Needless to say, these can also cause errors if not picked up by the user
The first thing to remember with relationships is that they are built off the primary data source. All the data in the primary data source is included. Only the corresponding matches in the secondary data source are included. In this simple example I have two datasets, sheet one, the primary data source and sheet two, the secondary data source.
The datasets are joined on a column called level. Let's now look at these data sources in more detail.
So this is the first data source and next to it I'll show you the secondary data source.
And as you can see, level is common to both data sources. As I mentioned previously, Tableau will take all the primary source data, but just the secondary source data that matches with the primary data. On our screen we can see that there's no salesperson associated with level seven and therefore when I create a data table using these values, we may run into a problem.
Let's see what happens when I do this. So on my sheet I've created a series of columns and for level seven because there's no match, the data from the primary source is included, but the salesperson column is blank. Unlike joined datasets, these blanks are not very easy to uncover in relationships because we normally don't access the raw table. We just drag dimensions and measures into the view. For larger datasets in particular, it can be difficult to find these blanks. To avoid this problem, make sure that all the values in the relationship column in the primary data source have a match in the secondary data source. What's more, you need to make sure that it's a single match. Otherwise you run in to the asterisks problem.
Asterisks problems occur when there are more than one matching value in the secondary dataset. Let's return to our data sources again where I have the primary and the secondary and again I'll move to make this easier to view.
For level three we have two salespeople, Wendy C. and John B.
Now when we create our sheet, level three provides us with an asterisks because the join doesn't know which salesperson to return.
When Tableau creates a relationship, it does so using the attribute function. You might remember that the attribute function is a test for uniqueness. It returns a value if only one value exists. Otherwise it returns an asterisks. In this scenario there are two salespeople in level three. Hence Tableau returns an asterisks. For all the other levels there's just one salesperson and no problem occurs.
So to reiterate, if you are going to use relationships, make sure that every value in the primary data source shared column has a single match in the secondary data source. Otherwise you will run into these problems. In addition to problems with blanks and asterisks, relationships can run into more difficulties with non-additive calculations such as count distinct and median. These functions require temporary tables and only some data sources allow temporary tables. In addition, they won't be available if your primary data source exists on a live connection. As a result of these issues, I almost always default to using joins instead of relationships outside of the handful of situations I articulated at the beginning of this lesson. It's definitely worthwhile becoming adept at both joins and relationships so take the time to experiment with both techniques ideally with datasets that are not perfect so you can identify and try fix some of the challenges and problems I've shown in this lesson.