Sign in or start a free trial to avail of this feature.
3. Inputting the Data
In this lesson, we learn how to connect raw data to Tableau Prep and make initial adjustments
This goal of this lesson is to input that dataset into Tableau Prep.
Compatible Data Types
Tableau Prep can read a variety of data files types including
- Tableau extracts
- Excel files
Tableau Prep can already read data directly from a SQL server
When inputting data with Tableau Prep you can do some quick pre-cleaning. These include
- Removing fields
- Changing data types
- Changing field names
- Adding filters
In this lesson, we removed the empty field. We also demonstrated a data filter but did not keep it.
Tableau Prep also has a variety of data sampling techniques. These allow for quick performance while also working with large datasets.
The default setting takes into account rows and columns when sampling data. This means that datasets with many columns and few rows will likely keep most of their rows. Users can also choose to keep all their data or use a fixed number of rows.
For sampling method, you can choose Quick select or Random sample. Quick select is faster to build but can be less accurate. Random sample takes longer to build but is more accurate.
In the previous lesson we looked at a dataset containing sales information for a pharmaceutical firm called Alta Vee-kah. In this lesson, we'll input the dataset into Tableau Prep.
We'll start by connecting to our data source. We'll then perform some basic data cleaning steps. To finish, we'll determine which type of data sampling method to use.
When we open up Tableau Prep, we an see an option to connect to data in the top left corner. We'll click it and see a wide range of possible data sources, which Tableau Prep can connect to. Tableau Prep can read Excel files as well as Tableau extracts. It can also read a wide variety of text file types, such as TXT and CSV files. Additionally, if your data primarily comes from a server, Tableau Prep can draw data directly from that server.
We'll start off with the Customer Addresses dataset, which is in an Excel file.
We'll select Microsoft Excel, navigate to the correct directory, and select Customer Addresses.
On the left pane, we can see the name of the file connected to Tableau Prep.
Below, we have a list of Tables or Sheets within this file.
Customer Addresses contains just one sheet, called Sheet 1.
Before we bring on any tools, let's perform some initial data cleaning steps. Performing these actions early on can save time down the road.
To the right, we have an overview of our dataset. Each row shows us a different field, including a brief view of some sample values from the dataset. This view can be a little confusing, as we're used to seeing these fields represented as columns. Instead, each column represents a type of data manipulation action.
In the first column, we see the option to uncheck certain fields, which will remove them from our dataset. We'll uncheck F3, as this field only contains null values.
The next column allows us to change the data types. Tableau Prep reads three main data types, Numbers, Text, and Date Time. Within Numbers, we can choose whole numbers or decimals.
For Date Time, we can choose either Date Time or just Date.
Tableau Prep is quite good at guessing the appropriate data type for our fields, so we don't need to make any changes here. Next, we have Field Name and Original Field Name.
We can use these columns to manually edit the names of our fields.
Let's correct the spelling mistake in the Adress field.
After this change, the old spelling remains in the Original Field name column. Finally, we'll perform some quick filters on our data. In our Latitude field, we can see sample values of around 50. To demonstrate this functionality, let's temporarily remove all values of 50 and above.
We'll click the Add Filter button next to Latitude and select Calculation.
We now see a dialog box with a field name already included.
We'll write less than 50 and click Save.
We now have just a single value of less than 50.
We'll undo this change by clicking the filter and then Remove.
The sample values for Latitude once again show values for 50 and above.
Note that there are further data manipulation options in the Input section in the center of the screen. We'll cover the Wildcard union in a later lesson, so we'll move on by selecting the Data Sample tab.
One advantage of using Tableau Prep over Excel is that it can read much larger datasets. However, if the dataset is too large, it will significantly impact the speed and performance of Tableau Prep.
We can limit this problem by taking a sample of our dataset. This will boost performance by temporarily cutting down the size of the dataset while we're working on it. It will restore the dataset to its original size when we're ready to output our data to Tableau Desktop. When sampling data, there are two methods we could use, Random sample and Quick select.
A completely random sample is more likely to consist of data points from all parts of our dataset, however this sample takes longer to create. Quick Select is created at a greater speed, but may not include data from all parts of the dataset. This can be an issue if the sample overlooks problematic data, as we won't be able to clean this data if the sample doesn't show it to us.
We can also control how much data we want to sample.
If we select the Default sample amount, Tableau Prep will automatically determine the sample size by accounting for the number of columns, as well as the number of rows in the dataset.
This is useful when we have a dataset with many columns, but not many rows. If that's the case, Tableau Prep will understand that in a dataset with so many columns, just a few rows can contain lots of data.
If columns were not considered, the sample would likely include more rows, meaning that it could become too large to be an effective sample.
If we do not want the sample to consider the columns, we can enter the number of rows we'd like to sample in Fixed number of rows.
This figure must not be greater than one million.
Tableau Prep enforces this limit because a sample larger than this will not improve performance and would defeat the purpose of creating a sample. We can also force Tableau Prep to bypass the data sample entirely by selecting Use all data. At just 103 rows, our dataset is actually too small for sampling, so we'll select Use all data.
And with that, we've inputted our data and performed some initial data cleaning steps.
We'll stop the lesson here.
In the next lesson, we'll take a closer look at the user interface.