Sign in or start a free trial to avail of this feature.
5. Cleaning Data
In this lesson, we deploy some initial data cleaning steps to prepare our data for further manipulation
This goal of this lesson is to tidy data with the cleaning step.
Simple Cleaning Methods
Tableau Prep has 6 simple data cleaning options. You can access these from the data cleaning step. These include:
- Make Uppercase
- Make Lowercase
- Remove Letters
- Remove Numbers
- Remove Punctuation
- Trim Spaces
The changes pane allows the user to make changes to the Cleaning step at any stage. You can remove any changes without affecting any of the changes that come before or afterward.
In the previous lesson, we took a look at the Tableau Prep user interface and investigated a complete flow. In this lesson, we'll go back to our sales dataset and tidy up our data with a cleaning step.
As you may recall from Lesson Three, we've imported the sales data, and performed a few preliminary cleaning actions.
We'll start this lesson by clicking the Plus icon next to the Input step, and select Add Step.
To access our cleaning options, we'll click the arrow at the top right of the address card. We now have a range of options. We'll start by hovering over Clean, which provides a few basic text cleaning options. One of the issues that we identified with the address field is that it has some unnecessary spaces.
We'll remove this redundant information by selecting Trim Spaces.
This removes spaces from the beginning and end of our text, as well as excess spaces between words.
Next, we'll go back to Cleaning Options, and select Remove Punctuation to demonstrate further cleaning functionality.
In this case, we can see it has removed the commas.
We'll now remove numbers.
I prefer uppercase addresses over lowercase, so let's change the text to all uppercase.
We now have manipulated address data that isn't particularly useful, as numbers and punctuation are obviously essential information. Fortunately, Tableau Prep uses the Changes pane to track any changes we've made in the current cleaning step.
Here we can see the four changes we've made so far.
We can click any one of these to revert the Profile pane back to that point in time.
For example, when we click Trim Spaces, the address card changes back to lowercase and the punctuation and numbers return.
A major advantage of using the Changes pane over Undo is that you can remove progress without affecting changes which occurred afterward.
Let's delete Remove Numbers and Remove Punctuation.
Notice that we were able to remove those two changes without affecting the Make Uppercase step which occurred afterward.
It's important to keep in mind that removing dependent steps can break your flow. Fo example, if the Make Uppercase step was dependent on either of the two steps we removed, then it would have broken our flow.
Note that changes in the cleaning step are also summarized in the Flow pane. The icon representing the cleaning step will feature small annotations to represent changes covered in that step.
At present, we can see an icon representing a data cleaning operation.
When we hover above it, we're given a quick summary of the two cleaning actions we took.
To recap, we performed basic data cleaning steps by removing unnecessary characters, capitalizing all address text, and then reviewing changes.
In the next lesson, we'll move on to restructuring our data with more advanced data cleaning tools.