1. Calculating Normal Probabilities

Overview

An important skill in hypothesis testing is being able to identify probabilities from a distribution. In this lesson, we’ll learn how to calculate probabilities using a normal distribution, which is used in many hypothesis tests.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Lesson Goal (00:19)

    The goal of this lesson is to calculate probabilities using the normal distribution.

  2. Overview of the Problem (00:26)

    The problem in this lesson relates to scores on an exam taken by many students, where the population mean is 240 and the standard deviation is 50.. Our aim is to calculate the probability of various possible values from this distribution.

  3. Calculating a Probability from the Distribution (00:39)

    The first problem requires us to calculate the probability of observing a value above 300. This is equivalent to the area to the right of 300 under the normal distribution. The area to the left of 300 represents the probability of observing a value below 300.

     

    To find this area, we first need to calculate the Z-Score for this observation, to transform it to a point on the standard normal distribution. Next, we use a Z-table to identify the area below this Z-Score. The table we use for this purpose can be found here. We use the rows and columns to identify the Z-Score of interest, then find the corresponding probability in the body of the table. The table gives us the probability of observing a value below 300, and we can subtract this probability from one to find the probability of a score above 300.

     

    Note that we can also use statistical software to find these probabilities instead of a Z-table, but we use the table in this lesson so that you fully understand how the process works.

  4. Calculating the Probability Within a Range (02:39)

    We can find the probability between two values by finding the probabilities to the left of the individual values, then finding the difference between these probabilities. For example, to find the probability of observing a value between 200 and 280, we find the probability of observing a value less than 200, and the probability of observing a value less than 280 using the same technique as before. We then find the difference between these probabilities, which represents the probability of observing a value between 200 and 280.

  5. Finding the Value for a Probability (06:05)

    Using the normal distribution, we can also find the value from the distribution which has a certain probability above and below it. For example, we aim to find the test score with a probability of 0.9 of being below it.

     

    To do this, we find the appropriate Z-Score by locating the value of 0.9 in the Z-table, and reading the corresponding Z-Score. We then use the Z-Score formula to identify the value from the normal distribution that corresponds to this Z-Score.

Transcript

In this course, we'll learn about the statistical technique of hypothesis testing. We'll learn about the basic principles of hypothesis testing, and we'll see how to conduct a wide variety of different hypothesis tests.

Our goal in this lesson is to calculate probabilities using the normal distribution.

We'll consider the example of a school test taken by a very large number of students each year.

Because such a large number of students take the test, we believe the scores follow a normal distribution.

The maximum score on the test is 400 points.

We've analyzed data for all the students that took the test and found that the average score is 240 and the standard deviation is 50. We want to use this information to solve various probability problems.

We'll first consider the score of a randomly selected student, which will represent as the random variable X. This student hopes to score 300 points or more.

We can use the normal distribution to find the probability of this score.

This will be equivalent to the area under the normal distribution to the right of 300.

To find this area, we'll find the Z-score for an observation of 300 points.

We'll then use a standard normal table, to find the probability of observing a value less than the Z-score, and subtract this value from one, to find the probability of observing a higher value than the Z-score.

Let's start by calculating the Z-score for a test score of 300.

We'll subtract the mean of 240 from 300 and divide by the standard deviation of 50 to get a Z-score of 1.2.

If we look at a standard normal distribution, our new objective is to find the area to the left of 1.2 on this distribution. This will be the same as the area to the left of 300 on the previous normal distribution.

To find this area, we have two options.

We can use statistical software like Excel, or we can look up a standard normal table.

Using software is generally easier, but in this lesson, we'll demonstrate how to use a table.

It's worth noting there are two types of standard normal table.

The first type gives the area to the left of any value of interest, while the second gives the area between zero and the value of interest.

In this lesson, we'll consider the first type of table as it's generally easier to use.

Let's now look up the standard normal table. This table can easily be found online, and you'll find a link to the table we're using in the summary below this lesson. We can see there are two tables, one for negative Z-scores, and one for positive Z-scores.

Each of these tables shows the area to the left of the relevant Z-score in a standard normal distribution.

For example, we want to find the area for a Z-score of 1.2 or 1.20. We'll go to the positive Z-score table, find the row for 1.2 and the column for .00.

At this intersection, we find a value of 0.8849.

This tells us that the area to the left of a Z-score of 1.2 in a standard normal distribution is 0.8849.

If we subtract this from one, we find the area above this point is 0.1151.

If we return to our original normal distribution, we can also say that the area to the left of 300 is 0.8849, and the area to the right is 0.1151.

This tells us that the probability of a randomly selected student scoring below 300 is 0.8849, and the probability of this student scoring above 300 is 0.1151.

Let's look at a second problem. In this case, we want to identify the probability that a randomly selected student scores between 200 and 280 points. To do this, we'll find the area to the left of 280 points and then subtract the area to the left of 200 points.

Let's find the Z-scores for test scores of 200 and 280.

Using the same formula as before, we find the Z-score for 200 is negative 0.8, and the Z-score for 280 is positive 0.8.

Now let's return to our standard normal table and find the areas for these two Z-scores.

For a Z-score of negative 0.8, we go to the negative Z-score table, find the row for negative 0.8 and the column for .00, which tells us the area is 0.2119.

Next we'll go to the positive Z-score table, find the row for 0.8 and the column for .00, which tells us the area is 0.7881.

We can now see that the probability of a random student scoring less than 280 is 0.7881, and the probability they will score less than 200 is 0.2119.

If we subtract 0.2119 from 0.7881, we find that the probability of scoring between 200 and 280 is 0.5762.

Finally, let's look at a different type of problem.

Let's imagine a particular college wants to accept only students that score in the top 10%, and want to know what score they should set as their entry requirement.

In effect, we want to find the test score where the probability of being above it is 0.1, and the probability of being below it is 0.9.

To do this, we'll return to our standard normal table.

Instead of looking in the row and column headings, we'll search in the body of the table for a probability of 0.9.

The closest value we can find is 0.8997.

This is in the 1.2 row and the .08 column, meaning it corresponds to a Z-score of 1.28.

This tells us that the area to the left of 1.28 on a standard normal distribution is approximately 0.9.

Next, we need to find what value this equates to on a distribution of test scores.

We can do this using the Z-score formula we've used before.

As you can see, here we know the value of Z, and we want to find the value of X.

If we rearrange the formula and complete the math, we find the value for X is 304.

This tells us that the probability of a random student scoring less than 304 is 0.9, and the probability of a student scoring more than 304 is 0.1.

This completes our look at calculating probabilities from the normal distribution.

As we've seen, we can generate a lot of insights from a normal distribution simply by knowing the mean and standard deviation and having access to a standard normal table.

In the rest of this course, we'll apply some of the skills we've acquired in this lesson to a variety of different hypothesis tests.

In the next lesson, we'll start by learning how to set up a hypothesis test.