Sign in or start a free trial to avail of this feature.
6. Splitting Fields
In this lesson, we continue to format our data by making use of the Cleaning Step's more advanced functionality
This goal of this lesson is to restructure the address data by splitting it into multiple fields.
Why split fields
In the dataset for this course, the address field contains 4 types of address data. Street, town, county, and postal code. Tableau Desktop can make use of this hierarchy of data for analysis. But only when each element contains its own field. This means we must split the address field into 4 new fields.
For the first method, we can use to split fields is the automatic split. This can automatically detect the delimiters and number of fields. It can also detect multiple delimiters. This method is not always reliable.
When automatic split does not produce the output we’d like, we can rely on custom split. This lets us choose the delimiter. We can also choose how many and which elements we’d like to create into new fields. For example, we can make a new field of just the first element.
In the previous lesson, we cleaned the text in the address field of the customer addresses dataset. However, this field still needs to be restructured before we can import it into Tableau Desktop.
In this lesson, we'll restructure the address data by splitting it into multiple fields.
Hen we look at the Address field, we can see that it contains four elements separated by commas, street, town, county, and postal code.
Tableau won't be able to do anything with this data as it is, so we must split it into four new fields.
Separating fields is important. Otherwise, Tableau will read each address as a unique data point, which is not useful for analysis.
Without separating this field into multiple new fields, we can't summarize our data.
If we want to visualize our address data in a bar chart without splitting it, we'd get a chart like this with over 100 bars, not very useful.
Let's try again and summarize by county.
Tableau Prep's primary method for splitting values is a function aptly named split values. Split values simply separates text into multiple fields.
It uses separators such as commas as a cue to determine how many new fields to create.
It has two methods for splitting fields, custom split and automatic split.
Custom split asks for the user to choose the appropriate separator. Automatic split will guess the appropriate separator and can handle multiple separator types. However, there are some limitations with the performance of this function. Let's see how automatic split can be problematic by using it on the address field.
We'll select Automatic Split and immediately see three new fields.
Note that our address has four elements.
It seems that automatic split has not created a field for county.
It's not clear why this happened, so the clearly the automatic split function is not foolproof.
We'll remove these new fields and start again with custom split.
When we select Custom Split, we're given two options. First, we must choose a separator.
We'll enter comma.
Next, we must choose which elements to convert into new fields.
If we just wanted street name and town, we could choose First and then two because these are the first two fields. If we just wanted postal code, we'd choose Last and one.
We want all elements so we'll choose All and click Split.
Each of the four address elements now have their own field.
We'll quickly clean our data.
We'll rename Address-Split 1 as Street, Address-Split 2 as Town, Address-Split 3 as County, and Address-Split 4 as Postal Code.
At this point, we no longer need the Address field, so we'll remove it.
There are a few more steps we need to take before proceeding to Tableau Desktop. We'll cover these in the next lesson with the group and replace feature.