9. Cleaning Address Data

Overview

Address data often causes the most difficulties for users manipulating text. In this lesson, we will learn how to join and split address data using a number of different Tableau functions.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

Functions to change the lettercase of words

LOWER(): - changes all words to lowercase
UPPER(): - changes all words to uppercase
- No function in Tableau capitalizes the first letter in each word
- If you would like to create a formula to do this, please follow this link

Joining and splitting strings of text

- To join strings of text together we simply use "+"
- To split strings of text, we use the SPLIT() function
- SPLIT() accepts a delimiter and the position of the substring you want to obtain
- SPLIT() counts the position from the left with positive numbers
- SPLIT() counts the position from the right with negative numbers

Transcript

In our data source, I've added a couple of address columns to the dataset for each pharmaceutical customer. Column headings are street, town, county and post code.

Unfortunately, the formatting for these columns is not ideal in that they are all in uppercase. To fix this, let's try a text function. So I'm going to go to my sheet and I'm going to start with Street. So I'll add Street to my Text table. To remove uppercase, I can use a function called LOWER, which will switch all of the letters to lowercase. And to do this, I'll double click.

I'll type LOWER and wrap the brackets around the field.

And when I do this, you can see that all of my text is now lowercase. In Excel, there's a function called PROPER and what PROPER does is capitalize the first letter of each word. Unfortunately, this is not something that Tableau has out of the box and instead, you need to write a very complex formula to create this PROPER functionality. In the show notes, I'll include a link that shows you how to do this. When you have lowercase text, you may want to change this to uppercase. And to do this is very simple. Instead of LOWER, we use UPPER.

So I'll simply type UPPER and as you can see, I get the uppercase back in my Street field. While UPPER and LOWER can be somewhat useful, the main way we clean address data is splitting and joining different address fields. Let's start off with joining. I'm going to create a single column called Address that combines street, town, county and post code, all in the one field. And to do this, I'll go to Analysis and Create Calculated Field.

I'll call this column Address.

And I'll start off with Street.

When I want to chain different columns of text together, I use the plus sign and this allows me to add Town and then County, and then Post code.

However, I'd like to put some formatting between each of these fields and what I'd like to do is put in a comma and a space. All I do is put a plus sign between this formatting and each column.

And I make sure that I use double quotes to contain this comma and this space.

And now my address field is complete. When I now add Address to Text and I remove UPPER and show Fit Width, I can see that I have a single column that includes my full address. And this is particularly useful when we want to do some geo coding.

Now, what happens if we want to split the address, say extract two or county from our string? Well, to do this, we use a function called split. In this example, I'm going to use the split function to extract out the town from the address. So let's create some more room for my cards, and I'll double click into Address.

And write SPLIT.

And the first argument SPLIT takes is the string and the next argument it takes is the delimiter, which will be a comma. And then SPLIT takes a token number. And the number is the position of the particular sub string within the larger string. In this example, the position of the town is two.

So I'll write two and close the bracket.

And as you can see, this now strips the address field of everything but the town. Let's now put this in a calculated field. So I'll go to Analysis, Calculated Field and I'll call it Split town, and this will simply equal to SPLIT Address, the delimiter and the token number, which is two, and then press OK.

And I remove it from my sheet and include the calculated field.

Now let's compare split town with the actual town field.

And as you can see, split town has a space included before the town name. We're going to fix this in the next lesson with the trim function.

>
Formulas in Tableau
Further Formulas in Tableau

Contents

My Notes

You can take notes as you view lessons.

Sign in or start a free trial to avail of this feature.

Free Trial

Download our training resources while you learn.

Sign in or start a free trial to avail of this feature.

Free Trial