5. Comparing Decision Tree Models

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Learn how to simultaneously deploy a range of decision tree models with contrasting predictor variables and compare the resulting output.

Lesson Notes

Comparing Decision Tree Models

  • In this lesson, we added two new models decision tree models with different predictor variable combinations
  • We compared the accuracy of these models

Transcript

In the previous lesson, we ran our workflow through a decision tree model. We find that 79.7 percent of the time, our model correctly predicted if an academic application was awarded a grant. Our first model contained four predictor variables. Contract.Value.Band, Grant.Category.Code, With.PHD.1, and No.of.Years.in.University.at.Time.of.Grant.

In this lesson, our goal is to determine if we can create a more accurate decision tree model. To that end, we'll compare our current model with two new models. To accomplish this goal, we'll follow two key steps. First, we'll create two new models with different predictor variables. We'll then compare these models and see if our new models prove to be more accurate. To start this lesson, we'll bring a second Decision Tree tool onto the canvas, and connect it to the estimation dataset. We'll name this model Decision_Tree_2, and again, target Grant.Status.

This time, let's include some more predictor variables, and see how our model improves. As mentioned in our previous lesson, the rationale for selecting specific predictor variables really depends on your knowledge of the data in question, and the task it's being used for. For this model, we'll select Grant.Category, Contract.Value, Role.1, Year.of.Birth.1, Country.of.Birth, Dept.No.1, Faculty.No.1, With.PHD.1, No.of.Years.in.University.at.Time.of.Grant, Number.of.Successful.Grant.1, and Number.of.Unsuccessful.Grant.1.

Again, we'll add all browses, so that we can consider our results. As running these decision tree models can be time consuming, we'll first add our third model before running the workflow. We'll navigate to the Predictive tab, bring down another Decision Tree tool, and again connect it to the estimation dataset. We'll name this model Decision_Tree_3, and again, target Grant.Status. In this case, let's include the 11 predictor variables we used previously, together with Home.Language.1. We'll also include variables for the second name on the application, if any. We'll end up with the following 22 predictor variables. Grant.Category, Contract.Value, Role.1, Year.of.Birth.1, Country.of.Birth.1, Home.Language.1, Dept.No.1, Faculty.No.1, With.PHD.1, No.of.Years.in.University.at.Time.of.Grant, Number.of.Successful.Grant.1, Number.of.Unsuccessful.Grant.1, Role.2, Year.of.Birth.2, Country.of.Birth.2, Home.Language.2, Dept.No.2, Faculty.No.2, With.PHD.2, No.of.Years.in.University.at.Time.of.Grant.2, Number.of.Successful.Grant.2, and Number.of.Unsuccessful.Grant.2. At this point, we're ready to move onto step two, and compare the accuracy of our models. To that end, we'll run the workflow.

Note that running this workflow can take some time to process, so I'll cut out the wait time in this video. We'll now go to the interactive reports for each model, and compare the accuracy, starting with the first model. As you might remember from the previous lesson, the accuracy is 79.7 percent, which we can see here.

We'll now move onto the second model. In this case, the accuracy is 88.7 percent, a marked improvement.

Finally, we'll move onto the third model. Here, we can see that the accuracy is 87.9 percent, just a touch less accurate than the second model. Given the information so far, our second model is perhaps the best, but both new models are potentially better than the first. However, we must be careful not to fall into a trap here. The accuracy values are based on the evaluation dataset that was used to create the model in the first place. In other words, this prediction is not based on new data. As in previous courses, we need to test these models on the validation dataset, to ensure that these results hold up. We'll test these models in the next lesson.