Sign in or start a free trial to avail of this feature.
7. The Confusion Matrix
In this lesson, we'll take a deeper look at the different values in the confusion matrix, and how to calculate them.
- The Confusion Matrix is a 2x2 matrix that charts predicted values vs. actual values
- True Positive is a positive outcome that is correctly predicted by a model
- False Positive is a negative outcome that is predicted by a model to be positive
- True Negative is a negative outcome that is correctly predicted by a model
- False Negative is a positive outcome that is predicted by a model to be negative
- Accuracy = (True Positives+True Negatives)/All predictions
- Misclassification = (False Positives+False Negatives)/All predictions
- Recall = True Positives/Actual Positives
- Precision = True Positives/Predicted Positives
When discussing model accuracy with respect to decision trees, we made reference to the confusion matrix. In this lesson, we're going to take a closer look at the confusion matrix and how you can use it to assess a predictive model's accuracy. Image you flip a coin 200 times and you're running a model which predicts an outcome of heads or tails. When the model predicts heads, the actual reality may be tails or it may indeed be heads.
Similarly, when the model predicts tails, the actual reality may again be heads or it may be tails. The model has two possible outcomes, and the reality has two possible outcomes. Therefore, we have a two by two matrix of results.
When assessing a model's accuracy, it's typical to refer to a model's performance with reference to these metrics. Of our 200 predictions, let's say that 100 of those say that the outcome will be heads, while the other 100 say that the outcome will be tails. However, let's say that in reality the outcome was heads on 80 occasions. Of these 80 heads, let's say that the model only correctly predicted the heads outcome on 75 occasions. This is called the true negative. Conversely, on five of those occasions the model predicted the outcome of tails, a false positive of five. Similarly, the actual outcome was tails on 120 occasions.
On 95 of those occasions the model predicted correctly, a true positive of 95.
On the balance of occasions, 25, the model predicted incorrectly, a false negative. These results can be taken in combination to assess the model's accuracy. For example, let's look at the total percentage of true positives, 37.5%, and true negatives, 47.5%.
If we add these figures, we get the model accuracy, which is 85%.
The total of false positives and false negatives is the amount of predictions that were misclassified, in our case, 15%.
We'll also wanna look at the two ratio predictions. The first, recall, is the ratio of true positives to actual positives. The other, precision, is the ratio of true positives to predictive positives, in our case, 95%.
These metrics may be simple to calculate, but they're essential when comparing models to determine the best fit for a data set. In the next few lessons we'll calculate these metrics for our different models and use them to compare our options.