1. Understanding Random Variables

Overview

Random variables help us understand and calculate the probabilities of different events. Learn how to use random variables in this lesson.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Lesson Goal (00:24)

    The goal of this lesson is to learn about random variables.

  2. What is a Random Variable? (00:29)

    A random variable is a variable that can take on different values based on the outcome of one or more chance events.

     

    Examples of random variables can include the number of heads we get when we flip a coin a certain number of times, or the amount of rainfall in a particular city in a particular month. Random variables can be discrete or continuous. A discrete random variable can only take on a specific, countable set of values. For example, the number of heads in a series of coin flips can only be a whole number between zero and the number of coins flipped. By contrast, a continuous random variable can take on any value within a range. Therefore, rainfall amounts would be continuous.

  3. Creating a Probability Distribution (03:36)

    A probability distribution is a statistical function that describes all the possible values for a random variable and their associated probabilities. It can be represented using a table or a chart. The probability distribution of a random variable helps us identify which values are most likely and least likely.

  4. Mean and Standard Deviation of a Distribution (04:59)

    The mean and the variance or standard deviation are generally the two most important properties of a probability distribution. The mean represents the expected value of the random variable, and is usually denoted by the Greek letter mu. The variance and standard deviation measure how spread out the data is around the mean. A high variance indicates spread out data, a low variance indicates data clustered around the mean.

    The standard deviation is the square root of the variance, so if we know one we can always calculate the other. The standard deviation is denoted by the Greek letter sigma, and the variance is denoted by sigma squared.

Transcript

In this course, we'll learn about probability distributions. If we understand the distribution that a random process follows, we can easily calculate the probability of any events associated with that process. Before we start looking at probability distributions, we need to understand how random variables work.

In this lesson, we'll learn about random variables.

A random variable is a variable that can take on different values based on the outcome of one or more chance events. To understand this, let's consider an example of two random variables. First, we have variable X, which represents the number of times we flip heads when we flip a coin eight times. The value of this variable can be any number between zero and eight. If we want to understand the probability of some number of heads, using a random variable is easier than trying to consider all the possible outcomes. Next, we have variable Y representing the amount of rain falling in a specific city in the last month. Based on previous data, we'll assume this variable can have any value between zero and 160 millimeters.

When we create a random variable, we want to understand if it is discrete or continuous. A discrete variable can only take on a specific countable set of values. For example, our X variable can only be a whole number. A value such as 7.5 heads is not possible. As a result, X is a discrete random variable. By contrast, a continuous variable can take on any value within a defined range.

Our variable Y has an upper and lower limit, but any value between these boundaries is possible. As a result, we would say that Y is a continuous random variable. Let's now consider how random variables and probability are related. We'll start with event X, where we're interested in the number of heads from eight coin flips. Each time we flip the coin, the value of X will take on a new value between zero and eight. If we want to know the probability of some number of heads, we can express this using X. For example, if we want to know the probability of getting four heads, this is the same as the probability that X equals four. If we want to know the probability of five or more heads, this is the same as the probability that X is greater than or equal to five.

For our continuous variable, things are slightly different.

Every month Y will take on a new value based on the amount of rain in that month. Because Y is continuous, there are an infinite number of possible values. As a result, we can't calculate the probability of some specific value. Instead, we calculate the probability that Y will fall within some range of interest. For example, the probability that there is less than 80 millimeters of rain in a month is the same as the probability that Y is less than 80. The probability that rain fall in a month is between 60 and 120 millimeters is the same as the probability that Y is between 60 and 120. In order to calculate probabilities like this, we need to understand the probability distribution for our random variables.

A probability distribution is a statistical function that describes all the possible values for a random variable and there are associated probabilities.

It can usually be visualized using a table or a chart. For example, here we can see a table showing the probability distribution for variable X, the number of heads from eight coin flips. The table shows us each possible value from zero to eight and the probability of each of these possible values.

Note that we use an uppercase X to denote the random variable itself and the lowercase x to denote the possible values of the variable. The probabilities in this table can be calculated by considering all the possible outcomes of eight coin flips.

In later lessons, we'll identify formulas that can be used to calculate probabilities for various different distributions. Below this table, we can see a chart showing the probability distribution for this random variable. This helps us easily identify which values are the most and least likely. For example, we can see that four heads has the highest probability while zero and eight heads have the lowest probability.

This is what we would expect intuitively. Finally, let's discuss the mean, variance, and standard deviation of this distribution. These are the most important properties of any probability distribution. The mean is usually denoted by the Greek letter mu.

It tells us the average value we can expect our random variable to have. The variance is represented by the Greek letter sigma squared. The square root of the variance is the standard deviation denoted by sigma.

The variance and standard deviation both tell us how spread out the range of possible variables is. A low variance indicates that values usually cluster near the mean, and the high variance indicates that the possible values are spread out across a wide range. In later lessons, we'll learn how to calculate the mean and variance for several different probability distributions.

This concludes our lesson on random variables.

In the next lesson, we'll start looking at specific probability distributions beginning with the discrete uniform distribution.