9. Understanding Sampling

Overview

Sampling is a common application of the concepts of permutations and combinations. Learn how to calculate probabilities associated with sampling in this lesson, as well as the effect that replacement has on these probabilities.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Lesson Goal (00:15)

    The goal of this lesson is to learn how to calculate probabilities associated with sampling.

  2. Understanding the Problem (00:20)

    Sampling is the process of selecting a permutation or combination of objects from a set. In our example, we have a company of employees spread across three offices. We want to understand the probability of selecting exactly one employee from each office if we select three employees at random.

  3. Probability for a Combination (01:16)

    Our aim is to find the probability for a combination, however we first find the probability for an individual permutation. We calculate the probability for one possible permutation where our selection of 3 people contains exactly one from each office. After we make each selection, we assume that the number of people from the relevant office and the number of people in the company as a whole both decrease by 1. This is known as sampling without replacement.

    We find that the probability for each permutation is the same, so we find the probability for the combination by multiplying the probability for a permutation by the number of permutations in the combination of interest. This gives us the probability that a selection of 3 people will contain exactly one person from each office, if we sample without replacement.

  4. Considering Replacement (03:49)

    An alternative method of sampling is sampling with replacement. Here we assume that when a person is selected from a particular office, another person replaces them. As a result, the number of people available for subsequent picks from that office and from the company as a whole does not decrease.

    When we calculate the probabilities of the same combination, but with replacement, we find the probability that our selection of 3 people will all come from different offices is lower than without replacement. This is because replacement increases the probability that second or subsequent picks will come from an office that has already been selected from previously. This demonstrates how probabilities can be altered depending on whether we sample with or without replacement.

  5. Course Summary (05:52)

    In this course, we covered the following:

     

    • Introduced fundamental probability principles

    • Calculated probabilities for one or more events

    • Learned how to use Bayes’ Theorem

    • Visualized probability problems

    • Calculated permutations and combinations

Transcript

In the previous two lessons, we learned about permutations and combinations. We'll now learn about sampling, which combines these concepts with probability.

In this lesson, we'll learn how to calculate probabilities associated with sampling.

Sampling is the process of selecting a permutation or a combination of objects from a set.

Let's return to the situation we were looking at previously.

Our company now has 10 employees divided into three offices.

Five employees work at the main office, two work in a secondary office, and three are remote workers, who we'll consider as a single group.

As before, we want to randomly select a committee of three people.

This time, we want to know the probability that a randomly selected committee will contain exactly one employee from each group.

We're not interested in the order in which people are selected. So we'll be interested in the probability of the combination where one worker in the committee comes from each group.

However, let's consider the probability for a single permutation where the first person comes from the main office, the second person comes from the secondary office and the third person from the remote workers.

There are 10 workers in total and five in the main office. So the probability of the first person being from the main office is five tenths.

For the second person, there are two people in the secondary office and nine people remaining in the whole company. So the probability is two ninths that the second person is from the secondary office.

Finally, there are three remote workers and eight people left in the company, so the probability of the third person being a remote worker is three eighths.

The probability of this permutation is five tenths times two ninths times three eighths, which is 30 720ths, or one 24th.

Let's consider another permutation where the first person is from the main office, the second person is a remote worker and the third person is from the secondary office.

Again, the probability of the first person being from the main office is five tenths.

With three remote workers, the probability that the second person is a remote worker is three ninths.

Finally, the probability of the third person being from the secondary office is two eighths.

The probability of this permutation is five tenths times three ninths times two eighths, which is 30 720ths, or again, one 24th.

In fact, we would find the probability of every permutation where there is one person from each group is one 24th.

As a result, their probability for the combination is one 24th times the number of permutations in the combination.

We know that the number of permutations for a committee of three people is three factorial or six. So the probability for the combination is six times one 24th, which is one fourth or 0.25.

This tells us that if we select committee members randomly, there's only a one quarter probability that the committee will contain one person from each group. If this is important to the company, they will probably need to impose quotas to achieve this objective, rather than just selecting randomly. So far, we've been sampling without replacement. This means that when we select a person from an office, they are not replaced, and the number of people in both the office and the company decreases by one. Let's now consider the same problem when we use sampling with replacement.

As before, we'll consider the permutation where the first person is from the main office, the second person is from the secondary office and the third person is a remote worker.

Again, there are five workers in the main office and 10 in total. So the probability that the first person is from the main office is five tenths. However, this time we assume that another worker is added to the main office to replace this worker before we make our second pick. For the second pick, there are two people in the secondary office, but there are still 10 people in total. So the probability that the second person is from the secondary office is two tenths. Again, we'll add a replacement worker to the secondary office before making our third pick. Now there are three remote workers and 10 in total. So the probability that the third person is a remote worker is three tenths. The probability of this permutation is five tenths times two tenths times three tenths, which works out to three 100ths. As before, the probability for each permutation with one worker from each group will be the same. So the probability for the combination is six times three 100ths, which is 18 100ths, or 0.18.

As we can see, the probability of selecting one person from each office is lower when we sample with replacement. The reason for this is that the replacement people increase the probability that the second or third pick for the committee will be from an office that is already represented.

When considering a sampling problem, you need to be sure that you understand whether the problem involves replacement or not, as it will affect the results. This concludes our look at the principles of probability.

In this course, we initially introduced fundamental probability principles and learned how to calculate probabilities for one or more events. We've learned about conditional probabilities and used them in Bayes' Theorem. We also saw how to visualize probability problems using sample spaces and probability trees. Finally, we learned about the concepts of permutations and combinations and used them to calculate probabilities associated with sampling. The knowledge you've gained in this course can be applied across many topics and provides a strong foundation to start learning about more advanced topics, like probability distributions.