4. Chop Lines of Text Part 1
LEFT, RIGHT and MID are functions that allow us to chop a cell of text into smaller parts. They are used regularly for cleaning text in datasets.
LEFT: Chop characters from the left of a string
RIGHT: Chop characters from the right of a string
MID: Chop characters from the middle of a string
ALT + I, C: Insert column
One of the more common cleaning techniques you'll perform in Excel will be chopping strings of text into smaller pieces.
To help us do this, Excel provides three useful functions: left, right and mid.
Left, as you can imagine, allows us to chop a number of characters from the left hand side of a string.
Say for example I wanted to only return the first three characters of each month entry rather than the full word.
Then I'll use left.
So I'll create a new column with Alt + I C.
I'll entitle it “Month”, and then I'll write “=left” and open a bracket.
The first argument will simply be the month entry.
I'll then write a comma, and then I'll enter the number of characters I wish to chop, which is three, and then close the bracket.
When I press enter, you can see that January has been converted to Jan.
We can autofill for the remaining lines by double clicking in the right hand corner.
The right function, as you can probably guess, simply allows us to chop characters from the right of a string.
And to show this in an example, I'm going to write Post Code and then chop the last five characters from the address field to return just the Post Code for each of our users.
So I'll write “=right” for the first argument. Again, I'll paste in the string of text.
I'll write a comma, and then the next argument will be the number of characters I want to chop, which is five, close the bracket and press Enter.
And again when I autofill for the remaining entries, you can see that we've now isolated the Post Code for each of our users.
Post Codes are valuable data for many businesses, because geo-location software can use this data to perform analysis of your customer locations.
As a result, chopping this value from an address field is a common task for many companies.
The last function we'll use in this lesson will be mid, which allows us to chop text from the middle of a string.
And we'll use mid to find out what Email provider our users have.
This can help us improve our customer profiling and our future marketing efforts.
So from firstname.lastname@example.org, I'll try and extract Facebook.
And from the next entry I'll extract Gmail, and so on.
I'll start by creating a new column with Alt + IC and I'll call this Email provider.
And then I'll write “=mid” and open a bracket.
And again, the first argument to be entered is simply the string, which in this case is the email address.
The next entry is the start num, and this is the position from where we want the chop to begin.
If I count the number of characters from the left of the cell to the F of Facebook, I get 10, so I'll write 10.
Then I'll write another comma, and now I need to enter the number of characters I wish to chop.
For Facebook, that's going to be eight.
So I'll close the bracket and press Enter.
And you can see this chop has been executed successfully, and we've isolated Facebook from the email address.
However, we still have a big problem.
If I autofill for the remaining entries, you can see that we have errors for almost every user.
And this is because the position of the email provider within the string can vary and the length of the email provider can also vary.
This is actually a common problem you'll find with the mid function, because we rarely need to extract strings in the exact same location and for the exact same length in each entry.
To solve this problem, we must combine mid with some additional functions that can help us identify the position of the @ sign and the .com in each of our entries.
I'll show you how to do this in the next lesson.