1. The Text to Columns Tool

 
Subtitles Enabled
Replay Lesson

Next lesson: The Unique and Find and Replace Tools

Watch next lesson

Data Manipulation

13 lessons , 2 exercises , 1 exam

Start Course
100%

Overview

In this lesson, we learn how to use the very popular text-to-columns tool to separate combined fields in Alteryx.

Lesson Notes

Text to Columns Tool

  • The Text to Columns tool splits data from one column into a specified number of columns
  • Users must specify the delimiter, or marker, that separates the data is to be split
  • Users must specify the number of new columns for the split

Transcript

It is not unusual to receive data as a single block.

In other words, not already separated into distinct fields.

If you're familiar with Microsoft Excel, you may have used the text to columns function to help break up such data.

Alteryx offers a similar tool designed to break up blocks of data into separate columns.

Let's look at an example.

First, we'll import the mobile sales data set onto our canvas.

Let's imagine that this data comes from an online sales platform, specifically in-app purchases.

Looking at the preview in the configuration window, we can see that each line of data contains information such as source, name, email, gender etc.

But our data is not separated into individual columns.

This is where the text to columns tool comes in.

We'll navigate to the parse tab on the tools palette and connect the text to columns tool to our workflow.

In the configuration window, we can see that Alteryx automatically identifies the top row as the header field.

Normally, we would need to choose the column to split.

But since our data is currently one single column, that choice is not relevant.

We'll now need to specify the delimiter, or the marker that is separating our data.

If we look at the data set again, we can see that the data is separated by commas.

We should therefore go back to the text to columns tool and specify the comma as a delimiter.

We now need to specify the number of columns.

Going back to the data set, and looking at the header row, we can see that there are 9 individual categories.

We'll head back to the text to columns tool and enter 9 in the number of columns box.

There are further choices which give you more refined separation rules but in this case, we are done.

So we'll run the workflow.

In the results window, we can see that our data has been separated into 9 distinct columns along with the original column.

You may have noticed that the column headers have not been separately defined.

We'll fix this issue with a select tool.

We'll navigate to the preparation tab, and connect the select tool to the workflow.

This will allow us to choose the relevant fields to carry forward and give these fields appropriate names.

First, we'll deselect field 1, since we just split the data from that field into separate columns.

Next, we'll rename each field.

Source, name, e-mail, gender, IP address, latitude, longitude, receipt number and finally, revenue.

Finally, notice that the revenue data contains a currency symbol.

We'll want to remove that symbol from the field, so we can manipulate that data.

But we still want to retain the currency information.

We can connect a formula tool to help us accomplish this.

Our first step will be to display the currency information of each revenue entry.

Accordingly, we'll name the output field currency.

For this formula, we'll use a simple if function.

Our data is in dollars, so our formula will output USD for any entries that contain a dollar symbol and other for any future entries that may contain another currency symbol.

We'll replace the C variable with a contains function, replace the first variable of that contains function with a revenue field and replace the second variable with a dollar symbol in quotes.

Next, we'll replace the T variable with USD in quotes and the F variable with other in quotes.

We'll now create a second formula to remove both the dollar symbol, and convert the revenue amount to a number.

Let's call this field Revenue#.

We'll start by adding a trim function and changing the first variable to direct the formula to search through entries in the revenue column.

We'll replace the second variable with a dollar symbol in quotes since that's the value we'd like to remove.

Finally, we'll wrap a tonumber expression around the trim expression to convert the revenue data from a string to a number.

We'll then change the data type to a numeric value and run the workflow.

As we can see, we now have separate columns for the currency and revenue value.

Let's quickly run through what we have done in this lesson.

First, we used the text to columns tool to separate data into various fields.

Then, we renamed those fields with the select tool.

Finally, we used the formula tool to separate the revenue data into currency information and revenue values.

In the next lesson, we'll look at how Alteryx can help us separate out duplicate entries as well as replace specific values.