5. Data Selection and Organization Part 1

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Continuing from the previous lesson, you will learn how Alteryx treats data from Excel files and how the Select tool can help you organize data for further analysis.

Lesson Notes

Selecting Data and Assigning Data Types

  • Alteryx classifies data into various types that let the program know the characteristics of the data, for example number or text
  • Users can view or change the data type by connecting the Select tool to their dataset
  • There are roughly 15 different data types, however users should only be concerned with 5 broad categories when thinking about their data: Boolean, Numeric, String, Date/Time, and Spatial Object
  • To understand the difference between specific data types, refer to this table 

Transcript

When importing a new data set into Alteryx, one of the most important things to watch for is the data type.

Alteryx will automatically assign a data type to each field or column in your dataset.

However, you should always take a moment to double check that your data is correctly assigned.

This is especially important if your data comes from a database or a simple csv file.

As Alteryx may not have sufficient information to assign multiple data types and may assign the same data type to all fields.

To check the data types for the csv file that we previously imported, we are going to add the select tool to our workflow.

This shows all the fields in the sample dataset which includes date, product name, sales etc.

As you can see, Alteryx is assigned the type V_string to each column.

To see what each column looks like, lets run the workflow with CTRL+R and view the output.

In the output window, we can see the data that occurs in each column.

In short, these columns do not all contain the same type of data, so we'll probably need to change the data type for some of them.

When you click the type dropdown in the configuration window, you can see lots of different data types to choose from, which can be intimidating at first glance.

However, all you need to remember at this stage is that Alteryx has 5 broad data types to choose from, which we'll quickly run through now.

At a high level, these data types are Boolean, numeric, string, date time and spatial object.

Lets now look at each of these broad categories.

First up is Boolean.

A Boolean data type can contain only 2 values.

In other words, data that is simply true or false or, in many cases 1 or 0.

Next, are numeric data types.

Alteryx contains lots of different numeric data types which we'll explore in the coming courses.

However, for now, its important to be aware that numeric data types are whole numbers or decimals.

If we look at the list of data types in the dropdown, we have various integer values which should be used for whole numbers.

We also have decimal types which include fixed decimal, float and double.

So why does Alteryx have so many different options for integers and decimals? Well, the answer lies in a trade of the programmers have to make between spaces and accuracy.

Numeric types that are more accurate require more space and thus, more processing power for any calculations.

In the shown notes, you will find a link with definitions for all the different data types.

But for now, the most important thing to remember is, why Alteryx has different options, and the trade-offs associated with these options.

In this current example, we can see that the quantity field contains whole numbers so we'll change the field type to integer 16.

Lets move on to text strings.

A string will hold information such as text or combinations of numbers and text while numeric calculations are not performed.

For example, product name, customer ID and street are all strings.

When it comes to different data types for strings, Alteryx likes to distinguish between strings of a fixed length and strings of variable length.

Variable length strings default to 256 characters long; sSlightly longer than a tweet.

Clearly, if you can use a fixed length, you have the potential to cut down a lot of wasted space.

Its a trade-off of flexibility versus processing speed.

Similar to the numeric data types.

In this case, the flexibility is not necessary, so I will assign the string type to product name, customer ID, street, town, county and post code.

The WString and V_WString types should be applied if your data contains non-western characters such as Japanese or Chinese.

The next group of data types are the date and time choices.

Date refers to year, month and day, whereas time refers to hours, minutes and seconds.

DateTime combines the two.

Dates must be in year, month, date format for Alteryx to correctly interpret them, hwever, we'll cover this in detail in a later lesson.

For now, we'll apply the date data type to the date column.

The final type choice is spatial object.

Spatial object types include centroids and polygons, used to convert data into recognize mapping information.

We'll cover this again in a series of lessons on spatial data mapping, so if you don't know what those items are, no need to panic.

There is no doubt that there are a lot of options for assigning data types in Alteryx.

While this may seem a bit daunting, users just need to remember the 5 broad categories discussed in this lesson.

Boolean or true and false options, numeric or whole numbers and decimals, string or text characters, date and time, and finally spatial objects or geographic data.

These individual data types will be covered again in future lessons and users can always refer back to the link in the show notes when trying to determine which specific data type to use.

Now that we've covered data types we'll look at the other ways the select tool can help users organize data in the next lesson.