5. Applying Bayes' Theorem

Overview

Bayes’ Theorem provides a formula that we can use to relate conditional probabilities. Learn how to use Bayes’ Theorem in this lesson.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Lesson Goal (00:11)

    The goal of this lesson is to learn how to use Bayes’ Theorem to calculate conditional probabilities.

  2. Bayes’ Theorem Formula (00:16)

    Bayes’ Theorem is a formula for calculating conditional probabilities using several other simple probabilities. 

    To derive the formula, we should understand that the probability of two events A and B occurring can be the probability of A times the conditional probability of B given A, or the probability of B times the conditional probability of A given B. If we consider these two formulas to be equal, we can find a formula for the conditional probability of A given B which will include the conditional probability of B given A, as well as the probabilities of A and B. This formula can be used in a wide variety of situations.

  3. Understanding the Problem (01:07)

    In this lesson, we consider a newly developed cancer test. We know the probability that a person tests positive given that they actually have cancer, but we want to find the probability that a person actually has cancer given that they test positive. 

    We can use Bayes’ theorem to calculate this probability if we know the probability a person actually has cancer, the probability a person tests positive, and the conditional probability that a person tests positive given that they actually have cancer.

  4. Calculating the Probabilities (02:36)

    In our problem, we need to calculate the probability that a person tests positive for cancer. To do this, we find the probability of someone actually having cancer and testing positive given that they have cancer. We then add the probability of someone not having cancer and testing positive given that they do not have cancer.

    Once we know the probability of someone testing positive, we can calculate the probability that a person actually has cancer given that they test positive. We multiply the probability of a person having cancer by the probability that they test positive given they have cancer, then divide by the probability of testing positive. 

    The result of this calculation depends on the parameters of the problem. In our case, the test is 99% accurate for people who have cancer, and 97% accurate for people who do not have cancer. However, we find that only 40% of people who test positive actually have cancer. This is because the form of cancer is rare, and as a result, the number of healthy people incorrectly diagnosed outnumbers the small number of people who actually have cancer.

Transcript

- In the previous lesson, we introduced the concept of conditional probability.

In this lesson, we'll learn how to use Bayes' theorem to calculate conditional probabilities.

Bayes' theorem is a formula for calculating conditional probabilities using several other simple probabilities.

In order to understand it, let's consider the formulas for the probability of A and B that we saw previously.

Either of these formulas should give the same result. So we can say the two formulas are equal to each other.

If we then divide both formulas by the probability of B, we get this formula for the probability of A, given B.

What this tells us is that the conditional probability of event A, given event B, can be calculated if we know the individual probabilities of A and B and the conditional probability of B, given A.

This is used in a wide variety of situations.

One situation where Bayes' theorem is commonly applied is in the medical sector.

Let's consider the example of a test for a rare form of cancer. 2% of the population have this form of cancer. The test is mostly accurate, but not perfect.

If a person has the cancer, there's a 99% probability that they will test positive for it. If a person does not have the cancer, there's a 97% probability that they will test negative.

We want to calculate the probability that a person who tests positive actually has this form of cancer.

Let's consider this in the form of events.

Let's say event C is the event that a person actually has this form of cancer.

Event T is the event that a person tests positive for the cancer.

We know that the probability of C is 0.02 as 2% of people have this cancer. The probability of T, given C, is 0.99 as 99% of people with this cancer test positive.

Finally, the probability of T, given C-dash, is 0.03.

Since 97% of people without this cancer test negative, 3% of people without this cancer will test positive. Our objective is to find the probability of C, given T. That is, the probability that someone who tests positive actually has this cancer.

Using Bayes' theorem, we can create a formula for this probability.

We already know the probability of C and the probability of T, given C. But we don't yet know the probability of T, so we'll need to that first.

The probability of T is the probability that a person tests positive for this cancer.

To find this, we'll add the probability of a person testing positive and having this cancer to the probability of testing positive and not having this cancer.

We know that the probability of T and C is the probability of C times the probability of T, given C.

Similarly, the probability of T and C-dash is the probability of C-dash times the probability of T, given C-dash.

We know the values for all these probabilities, so we can insert them into this formula.

When we complete the calculation, we get a value of 0.0492 for the probability of T.

This means that the probability of a person testing positive for this cancer is just below 5%.

Now that we know this, we can calculate the probability of having this cancer, given a positive test. Again, we know the formula to use from our Bayes' theorem. We know all the probabilities needed to complete this formula, so we'll fill them in. We find the probability of C, given T, is 0.4024.

So what does this mean? This tells us that if a person tests positive for this cancer, the probability that they actually have this cancer is just over 40%. There's almost a 60% chance that this person doesn't actually have this cancer. This may seem counterintuitive. After all, the test is 99% accurate for people who have this cancer and 97% accurate for people who don't. The issue comes about because the proportion of the population without this cancer is so large. Even a small percentage of this group represents a reasonably large number of people. As a result, the number of healthy people who are falsely told they have this cancer greatly outnumbers the very small proportion of the population who actually have this cancer. Therefore, we would say that the cancer test needs to be even more accurate, particularly for people without the illness, if it is to be used widely.

Bayes' theorem teaches us that there can be a big difference between a hypothesis and an event. In this case, a positive test gives us a hypothesis that a person has this cancer, but that may not mean that the event of the person actually having this cancer occurs.

This concludes our look at Bayes' theorem.

In the next lesson, we'll learn how to visualize probability problems using a probability tree.