Sign in or start a free trial to avail of this feature.
2. Introducing the Dataset
This lesson introduces the datasets that we will be using in this course. All datasets relate to the fees generated by a pharmaceuticals company distribution company.
To explore more Kubicle data literacy subjects, please refer to our full library.
This goal of this lesson is to examine a dataset and see what we need to clean.
We need to check a dataset for a pharmaceutical company. They want to buy some Tableau Desktop licenses. But, their data is poorly structured. Much needs to be done before they can use any of their data in Tableau Prep.
Data Cleaning Checklist
We identified 4 errors in this dataset. We’ll need to fix in the following lessons.
- Leading spaces in the address field
- Empty field
- Misspelled field name
- Combined address field
In the previous lesson, we briefly introduced Tableau Prep and looked at how it compares to Excel and Alteryx.
In this lesson, we're going to examine a dataset and determine the required data cleaning steps.
The pharmaceutical company Altavica has recently purchased several Tableau Desktop licenses and are hoping to gain greater insight into their sales performance.
However, there's one roadblock into rolling out Tableau. Prior to the decision to purchase Tableau, the company didn't enforce a consistent format for data storage.
Let's start by looking at the customer address dataset to examine what exactly causes this data to be problematic for Tableau Desktop. Several bad data collection practices led to a messy dataset. First of all, the addresses in the address field have leading spaces.
Second, there's an empty field after the address field.
Third, the address field name has been misspelled, and last, the address data was poorly managed and each customer's entire address is contained in this single field.
Tableau Desktop expects address information to be spread across multiple columns.
This will be demonstrated in a later lesson.
As you can see, there are several issues for Tableau Prep to take care of before we can use this dataset in Tableau Desktop.
We'll start by inputting the customer address dataset into Tableau Prep. We'll cover how to do this in the next lesson.