10. Using the RegEx Tool

Overview

In this lesson, we learn how to use the RegEx tool and apply it to a common data manipulation task.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Lesson Goal (00:09)

    In this lesson, we'll apply a regular expression to our postal code data set.

  2. Extract Area and District Portion of the Postal Code (00:17)

    We’ll connect a RegEx tool and choose the Address field as the column to parse. We’ll then enter the code we developed in the previous lesson:

    ([A-Z]([0-9]{1,2}[A-Z]{0,1}|[A-Z]([0-9]{1,2}[A-Z]{0,1})))

    We’ll then leave the case insensitive box checked.

    Next, we need to choose the Output method. We’ll select Parse, since we are just trying to pull out the area and district information. This results in three columns – one for the entire code, and separate columns for each subexpression.

  3. Combine Address Info with Postcode File (02:39)

    We’ll bring down an Input Data tool and connect it to the Postcode district look up file. We’ll then use a Join tool to merge the workstreams, joining on the RegExOut1 and Postcode district fields. We’ll connect a Select tool to remove unnecessary fields, and a Browse tool to view the results.

  4. View Results (03:33)

    All of the address from our original dataset have now been matched up with district and area information.

Transcript

In the previous lesson, we looked at the format of a Regular Expression.

In this lesson, we'll apply that expression to our postal code data set.

In the preview window, we can see that each customer has a one line address and postcode.

The area and district, is the first part of the postcode.

However, it could be three or four characters long and not necessarily indicated by a space.

Using a simple filter or formula will not work here, we need to use RegEx.

From the parse tab on the Tools palette, will connect a RegEx Tool to our workflow.

First, we need to identify the field that we wish to parse, in this case, the address field.

If we click the plus symbol next to the regular expression box, we can see there are several choices for more standard expressions.

However, we're looking for something quite specific.

We'll paste in the expression that we created in the previous lesson.

This expression can be found in the lesson summary below the video.

We haven't differentiated between upper and lowercase characters in our expression, so we'll leave the case insensitive toggle selected.

As you become more adapt at coding regular expressions, you may realize that the expression that we've proposed could be improved.

For example, as a rule, the first letter of a UK postcode will not include Q, V, or X.

If this was a professional assignment, we might want to amend our expression so that those letters are excluded from our search. Similar rules apply to other parts of the postcode, so the expression could be more refined.

For now however, we'll continue with our simplified version.

For the output method, we would like to parse the data.

You can see that there are other options available, which we'll discuss in the next lesson.

Will now run the workflow.

In the results window, we can see that there are three new columns.

This is because ultrix generates a new column for our entire expression and any sub elements.

In this case, we're only concerned with the output from the entire expression, so we'll focus on that going forward.

Now that we've separated out the area and district part of the postcode, we'll cross reference this data with our postcode district file.

We'll bring a new input data tool onto the canvas, and import the postcode district file.

We'll use a joint tool to connect these two data sets.

We'll match the postcode districts from the postcode district file with the principal output from the regular expression RegEx1 Will connect a select tool to the J node, and choose which fields to bring forward.

We'll deselect the RegEx outputs.

We'll also connect a browse tool, and run the workflow.

We can see that each of the addresses has been matched up with a district and area information and is now ready for our sales team.

We'll quickly run through what we've accomplished in this lesson. First, we use a RegEx Tool to extract the area and district portion of the postcode.

We then use the join tool to combine our address information with our postcode district file.

Finally, we attached select and browse tools to view our information including area, district and country name.

In the next lesson, we'll look at some of the other functions available in the RegEx Tool.

Data Manipulation
Finding and Replacing Data

Contents

My Notes

You can take notes as you view lessons.

Sign in or start a free trial to avail of this feature.

Free Trial

Download our training resources while you learn.

Sign in or start a free trial to avail of this feature.

Free Trial