Sign in or start a free trial to avail of this feature.
4. Decision Tree Outputs
Having successfully deployed your first decision tree model, in this lesson you will learn how to analyse the decision tree report output, including reading a 2 x 2 confusion matrix.
Decision Tree Outputs
- The R output contains data about each of the model branches
- The I output contains the following:
- A visual representation of the decision tree
- A confusion matrix detail true positives and negatives, as well as false positives and negatives
- A summary of the confusion matrix data
- True Positive is a positive outcome that is correctly predicted by a model
- False Positive is a negative outcome that is predicted by a model to be positive
- True Negative is a negative outcome that is correctly predicted by a model
- False Negative is a positive outcome that is predicted by a model to be negative
- Accuracy = (True Positives+True Negatives)/All predictions
- Misclassification = (False Positives+False Negatives)/All predictions
- Recall = True Positives/Actual Positives
- Precision = True Positives/Predicted Positives
In the previous lesson, we configured our first decision tree tool, the first step in streamlining our grant application process.
So far, we've targeted the variable Grant Status and specified four fields as predictor variables. In this lesson, our goal is to run the decision tree and consider the outputs so we can understand what the decision tree measures. This will allow us to compare various decision tree models in the upcoming lessons. To that end, we'll add Browse tools to our decision tree and run the work flow.
There are three outputs from the decision tree. The O output contains information on the physical size of the model. The R output contains a report.
We'll expand it and see that it contains basic data regarding the variables used for the model together with data on the model branches. You may find this report informative, however it's not directly useful for our analysis. Instead, were going to focus on the third output: The I interactive report.
We'll expand this report and see that there are three tabs on the the left side of the window. We'll start with a tree tab at the bottom and work our way up. This tab shows the visual representation of our model's predictions. This is an interesting look at every decision point in the model, but isn't that useful for comparing multiple models. So we'll move on. Next, we have the Misclassifications tab, also known as the Confusion Matrix. This table summarizes how the decision tree predictions fair against the data we used to generate the model. Remember, we're asking the model to predict whether an academic application was approved for a grant or not. The predicted outcomes are listed on the left with the actual outcomes delineated at the top. Looking at the top left corner of the table, we can see that there are 2,013 cases where a model correctly predicted that an application would be approved for a grant. This is known as a true positive. Conversely, we can see that there are 442 instances where our model predicted an application would be approved and it was in fact not approved. This is a false positive. Similar information is presented for the instances where our model predicted an application would not receive a grant. In this case, the model predicted correctly on 2,150 occasions.
These instances are known as true negatives. However, on 620 occasions, the model said an application would not receive a grant when in fact it did. False negatives. Next, we'll look at the Summary tab. This tab contains metrics that are derived from data in the Misclassifications tab.
The first metric is Accuracy.
This is simply the percentage of outcomes that were correctly predicted by our decision tree. We'll go back to the Misclassifications tab to confirm this result.
If we add up all the predictions, we can see that there is a total of 5,225.
On 4,163 occasions, our model guessed correctly.
This is simply the sum of true positive plus true negative predictions. This also equates to 79.7% of the data, confirming the accuracy value in the Summary tab. We'll move on to the other metrics in the tab, skipping over the F1 Score for now.
The two metrics at the bottom, Precision and Recall, are also calculated by referencing the Misclassifications tab. The Precision metric measures the proportion of true positive predictions out of the total positive predictions, or the proportion of grants predicted by the model that are actually awarded. This metric can be expressed as the number of true positives divided by the sum of true positives and false positives. The Precision metric is also represented in this table as the percentage figure in the True Positive box. Moving on, the recall metric measures the proportion of true positive predictions out of all actual positive results or of the proportion of grants awarded that are predicted by the model. This can be expressed as the number of true positives divided by the sum of true positives and false negatives. We'll head back to the Summary tab and look at the final metric: F1 Score. This metric is sometimes known as the F Score or F Measure and is simply an average of the Precision and Recall metrics.
Now that we understand the outputs of the decision tree, we can use this knowledge to compare different models. In the next lesson, we'll add two new decision tree models and compare the results.