Sign in or start a free trial to avail of this feature.
1. Finding and Replacing Data
Learn how to use the Unique tool to filter out duplicate records, and how the Find and Replace tool can transform specific fields directly.
To explore more Kubicle data literacy subjects, please refer to our full library.
Course Overview (00:10)
We’ll start the course by applying the unique and find and replace tools to a mobile sales dataset.
We’ll then learn how to use the Fuzzy Match tool to combine similar company name entries generated from survey results.
Finally, we’ll introduce regular expressions and use the regex tool to find specific values in postal code data.
Lesson Goal (00:40)
In this lesson, we’ll learn how to use the unique and find and replace tools.
Removing Duplicate Values (00:48)
We want to remove any duplicate entries from the sales dataset. We’ll connect a Unique tool, which can be found in the Preparation tab on the Tools Palette. In the configuration window, we must specify which fields to consider when deciding on uniqueness. We’ll specify all fields, which means we’ll be removing records that contain the same values in all fields as other records.
The duplicate records flow to the D output node, and unique records flow to the U output node.
Replacing Values (02:48)
We would now like to replace all XYZ entries to Facebook. We’ll connect a Find and Replace tool to the Unique tool, connecting the U output to the F input. This tool can be found in the Join tab on the Tools Palette. The F input should have the dataset with the text we are looking to replace. The R input node needs to be connected to a dataset with both the text value we’re looking for and the replacement text.
We’ll create a dataset with our replacement text by connecting a Text Input tool. This tool can be found in the In/Out tab on the Tools Palette. In the configuration window, we’ll create two columns: one labeled Find, and one labeled Replace. In the first row, we’ll enter the text XYZ in the Find column and facebook in the Replace column.
We’ll now adjust the configuration window for the Find and Replace tool. We’ll choose Source as the Find Within Field, Find as the Find Value, and keep the Any Part of Field radio selected. Other options here include Beginning of Field, and Entire Field. We also have the option to make the find criteria case insensitive and choose to match the whole word only. We’ll select this last option.
In the replace section of the configuration window, we’ll choose to replace the found text with the value in the Replace column.
In this course, we'll look at the various options for finding and replacing data in Alteryx.
We'll start by applying the Unique and Find and Replace tools to our mobile sales dataset.
We'll then learn how to use the Fuzzy Match tool and use it to combine similar company name entries generated from survey results.
Finally, we'll introduce regular expressions and use the RegEx tool to find specific values in postal code data.
In this lesson, we'll learn how to use the Unique and Find and Replace tools.
Over the course of your analysis, you may find duplicate values in your datasets.
In some cases, these values may be valid.
For example, a sales dataset may contain multiple different sales entries to the same customer.
In other cases, these duplicate values may be invalid and must be removed.
In these situations, we can remove the invalid entries with the Unique tool.
We'll use this tool to identify unique entries in a dataset containing in-app purchases from an online sales platform.
This dataset contains information on the original source website for each sale, several fields of customer data, receipt information, and the amount of each sale.
Unfortunately, due to previous manipulation, this dataset contains exact duplicates of many records.
We'll navigate to the Preparation tab on the Tools pallette and connect a Unique tool to the workflow.
In the Configuration window, we must specify which fields Alteryx needs to consider when deciding on uniqueness.
In this case, we want to specify all fields because we only want completely unique sales records.
This means we'll be removing records that contain the same values in every field as other records.
We'll now run the workflow.
Duplicate records have now been separated into the D output node.
Conversely, unique records can be accessed through the U output node.
We want to focus on unique values going forward.
As we mentioned before, the Source field tells us the original source that directed the customer to our sales platform.
If we look at line 47, we can see the reference XYZ.
Let's imagine this is an online marketing reference for a price comparison site.
However, in this example, XYZ has recently been acquired by Facebook, and any future XYZ entries will instead read as Facebook.
We'll want to replace all XYZ entries to Facebook, so that our data is consistent going forward.
We can do this using the Find and Replace tool.
We'll navigate to the Join tab on the Tools pallette, bring the Find and Replace tool onto the canvas, and connect the F input node to U output node of the existing workflow.
The F node denotes the dataset that we'll be performing the Find and Replace on.
The other input node of the Find and Replace tool needs to connect to a dataset containing our replace parameters.
That is the text we're searching for as well as the replacement.
In this case, we'll need to create a new dataset.
We can do this with the Text Input tool.
We'll navigate to the In/Out tab, and connect a Text Input tool to the R node of the Find and Replace tool.
We'll then navigate to the Configuration window to label our fields.
We'll label the top-left cell, Find, and the cell directly to the right, Replace.
In the first row of the Find column, we'll specify the text we're looking for. In this case, XYZ.
In the first row of the Replace column, we'll specify the replacement text or Facebook.
We've now created our replacement dataset, so we can move on with our Find and Replace.
We'll select the Find and Replace tool and change the settings in the Configuration window.
Remember, we want to replace the text, XYZ, whenever it appears in the Source column of our original dataset.
As such, we'll select Source in the Find Within Field dropdown.
We'll then click on the Find Value dropdown and select the Find field as this contains the search criteria we specified in our new dataset.
The text XYZ could appear anywhere in the Source field, so we'll keep the Any Part of the Field radio selected.
We want Alteryx to find and replace only exact matches, so we'll check the Match Whole Word Only box.
Finally, we need to specify what to replace the text with.
We'll click the dropdown and select Replace from our new dataset.
There are more options which allow you to refine your search further, but, in this case, we're done, so we'll run the workflow.
We'll view the Results window again, scroll down to record 47, and see that XYZ has been replaced with Facebook.
Let's quickly recap what we've done in this lesson. First, we connected a Unique tool to separate out our unique and duplicate entries in the mobile sales dataset.
We then used the Find and Replace tool to replace all XYZ entries with Facebook in the Source field.
Let's stop the lesson here.
In the next lesson, we'll learn how to use the Fuzzy Match tool to combine duplicate or similar records that contain subtle differences.