5. Chop Lines of Text Part 2
Combining MID with a new function, FIND, allows to extract words from the middle of a text string much more effectively. Here how I show you how to extract the email provider (e.g. gmail) from an email address.
MID: Chop characters from the middle of a string
FIND: Identifies the position of a specific characters
ALT + I , C: Insert column
In the previous lesson, you might remember that we had trouble extracting the email provider from an email address using the mid function.
It turns out that we need another function that helps us identify the position of where the email provider string starts and finishes.
This function is called Find.
Let's start by creating a couple of new columns with Alt+IC.
I'll entitle these columns "start_num" and "num_chars", which are the two inputs to our mid function.
Let's start with "start_num", and this represents the position of the first character of the email provider within our email address.
And so for Facebook, represents F.
The F is obviously one position after the "@" symbol, and the Find function allows us to calculate the position of the "@" symbol.
So we'll do this and then add one to the Find function.
So I'll write "=find", open a bracket, and the first argument we need to enter is the symbol that we want to find, which is the "@" symbol.
And we just need to wrap this in double quotes.
The next input is the text string.
So I'll simply select the email address, and then I'll close the bracket because the last argument is optional.
I'll then add one and press Enter.
And this gives me a position of 10, which you might recall from the previous lesson, is the correct answer.
I'll now autofill for the remaining entries by double clicking in the bottom right hand corner.
Now let's move on to the "num_chars" calculation.
Which is simply the number of characters in our email provider string, And this is the distance between the period after the email provider and the "@" symbol.
So I'll use a Find function, define the position of the period, so "=find", open double quotes, write a period, write a comma, and then pass in the email address.
The optional argument at the end allows us to select a start num after which the period will be identified, and here I'll use a start num of 10 which we calculated earlier.
If there happened to be a period in the string before the "@" symbol, this will now protect us from identifying that position.
I'll then close the brackets to complete the Find function.
To calculate the number of characters between this Find function and the position of the "@" symbol, I'll simply subtract the "start_num" value from our find function.
I'll then press Enter to complete.
And as you can see, eight characters is the length of the Facebook string.
Again I'll double click in the right hand corner to complete for the remaining values.
With "start_num" and "num_chars" calculated, we can now redo our mid function.
So I'll write "=mid", the first argument will be the email address, the next argument will be "start_num", and the final argument mid number of characters.
Close the bracket and press Enter.
I'll then autofill for the remaining entries.
And as you can see, we've now pulled the email provider successfully from the email address string.
While it would be possible to write one long formula to calculate the email provider string, you can see that when using a new function for the first time, creating additional columns such as "start_num" and "num_chars", simplifies the calculation dramatically.
For practice, try and calculate the email provider string using the mid function and two Find functions, all within one formula.
I'll leave my answer in the after file below this video.
Obviously, there are many business applications of chopping strings using the mid and Find functions.
And you'll probably have to tweak this solution slightly when cleaning different types of data.
The Find function is very powerful, because you can enter any content you want in the double quotes, not just symbols such as period and the "@" symbol.
In this example, if we wanted, we could have entered ".com" in the double quotes instead of the period, because all our email providers use that extension.
Be sure to experiment with the Find function so that you're comfortable using it when cleaning data.
It's a big time saver when utilized correctly.