6. Data Selection and Organization Part 2

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

In this lesson, we ensure that our data is correctly interpreted by Alteryx using a number of tools, before proceeding with our analysis.

Lesson Notes

Assigning data types to Excel files

  • Alteryx does a better job of automatically assigning types to data from Excel files, generally matching the correct broad category
  • Users may still need to adjust the specific data type - refer to this table for guidance
  • As a rule, users should always connect the Select tool to verify data types regardless of the source

Organization options in the Select tool

  • Fields can be reordered by using the arrows, or using the Sort or Move functions available through the Options menu
  • Both the arrows and the Options menu are located at the top of the configuration window
  • Fields that are Removed will be ignored by any additional functions connected to the workflow
  • The Rename field will allow users to change the field names that appear in the output window
  • Field descriptions are only visible in the configuration window for the Select tool

Transcript

In the previous two lessons, we learned how to import a CSV file and assign different data types to the various columns in that CSV file.

Upon import, Alteryx automatically assigns each column a data type of variable string.

However, this is not always the case.

Let's see what happens if we import the same file in the Excel format.

So I'll bring down the input data tool, click the down arrow in the configuration window, click file and navigate to my Excel file.

Notice that I need to select the sheet that I'm pulling data from.

Just a reminder that importing data from various sources can be slightly different.

We'll now go to the preparation palette, bring down the select tool, connect it to our workflow and press CTRL+R to run the tool so that we can view the data in the output window.

As we can see in the configuration window, Alteryx has a much easier time assigning the correct data types to each column.

For example, we can see the date field has been assigned the date type and the quantity field has been assigned a numeric type.

While Alteryx can be quite good at predicting the most appropriate type for some data sources, we'll still have to reassign other data types.

As we did in the previous lesson, I will sign the string type to the product name, customer ID, street, town, county and postcode.

Also, I know that the values in a quantity field are whole numbers.

So I reassign that field to integer 16.

Now that we have the correct data types assigned to each column, let's make our dataset easier to navigate.

First, we'll re-order the fields, so that the most frequently used fields are at the top.

We can do this by either highlighting a field and using the arrows at the top of the configuration window, or alternatively by clicking on the options button and using the move or sort options offered.

For this example, I will sort by field type in ascending order.

Moving on, we may want to remove fields that are not involved in our analysis.

Removed fields will not be carried forward in a workflow making the dataset easier to navigate while also saving space and speeding up processing time.

Only fields marked with a check to the left of the field column in the configuration window are carried forward in the workflow.

To remove a field, we'll simply need to uncheck the corresponding box.

In this case, I will uncheck the unknown field, since it has no known data and it's not relevant to our analysis.

The last edit we'll perform on our dataset is renaming fields.

In some databases columns can have obscure names and it may be worthwhile to rename the field to something more relevant.

The configuration window for the select tool gives you the opportunity to rename your fields and even add descriptive information which can be especially useful when collaborating with others.

In this case, I will rename customer ID to customer number.

I will also write a short description in the description field.

We'll now run the workflow again and we can see that the fields have been reordered and the name for the customer ID field has been changed in the output.

The previous three lessons have shown you the datasets required quite a bit of preparation before we begin our analysis.

We'll often need to assign data types, sort the different fields, rename some of those fields, or even remove fields from our dataset.

However, with these tasks now complete, we'll move on to some new organizational tools in the next lesson beginning with the record ID tool.