Sign in or start a free trial to avail of this feature.
9. Boosted Model
Developing your knowledge of Alteryx predictive model tools, in this lesson you will learn how to train a boosted model on your dataset.
- This model is based on a decision tree
- It attempts to improve a decision tree through algorithms applied at each decision point
- Be wary of overfitting when using this model
- For more information on this topic, please follow this link
In the previous lesson we prepared our decision tree workflow so that we could derive confusion matrix data as an output from our three models. Now that we've set up our three decision tree models for further analysis, we would like to compare these results with other models available in the predictive pallet. In this lesson, our goal is to add the first of those models, the boosted model. We'll then add confusion matrix values for this model and analyze the results. We'll accomplish this goal in three key steps. First, we'll add a boosted model to our workflow and configure the tool. We'll then use a score tool, formula tool, and summarize tool to calculate confusion matrix values just as we did for our decision tree models. Finally, we'll run the workflow and analyze the results to see how this model performs. Before we jump into the lesson, let's take a moment to explain what a boosted model is. The boosted model, also known as gradient boosting, is based on decision tree models. As we've seen previously, decision tree models are not perfect. Our best model predicted 88% of the results correctly. However, even that result is a bit misleading, as the decision tree has only a degree of certainty for each record.
The boosted model tries to account for these deficiencies and improve upon decision trees through some complex math at every decision point. As with other complex models, you'll need to wary of over fitting when applying a boosted model.
Let's now move onto our first step, connect a boosted model, and configure that model. We'll navigate to the predictive tab on the tools pallet, bring a boosted model onto the canvas, and connect it to the estimation dataset from the create samples tool. We'll navigate to the configuration window, give this model the name Boosted_Model, and again select the target field Grant Status. We now need to select the predictor variables for analysis. We'll choose Grant Category Code, Contract Value Band, with PhD One, Number of Years in University at Time of Grant One, Number of Successful Grant One, and Number of Unsuccessful Grant One. We'll stick with these six variables for all new models going forward in this example. In the real world you may wish to work with a different combination of target fields, as we did with the decision tree models. As with many of the advanced Alteryx tools, knowledge of the dataset in question, as well as trial and error, should drive your judgment around what variables to choose. At this point, we're ready to move onto step two and calculate the confusion matrix values. As a shortcut, we can use the score tool, formula tool, and summarize tool from one of the other models, as a base.
We'll select those three tools and copy them. We'll now paste the tools, and connect the score tool to both the boosted model, as well as the validation dataset. We'll then navigate to the formula tool and change the model name field to Boosted_Model.
We can now run the workflow. Know that this calculation may take a few minutes, so I'll cut out much of that wait time in this video. Once the workflow finishes running we'll be ready to move onto step three and analyze the confusion matrix values. We'll click on the output node of the summarize tool and see that there are approximately 1094 true positives.
Let's compare this with the output for decision tree three.
We'll click on the output of that summarize tool and see that it predicted 1167 true positives.
Further analysis will be necessary but the initial conclusion is that our boosted model does not offer a significant improvement. Before we end the lesson, we'll put the boosted model into its own container. We'll select all four tools, right click, and select Add to New Container.
We'll name this container Boosted Model. Let's quickly recap what we did in this lesson. First we added a boosted model to the workflow and configured the tool with six predictor variables. We then connected a score tool, formula tool, and summarize tool, to calculate confusion matrix values just as we did for our decision tree models. To that end, we simply copied and pasted those tools and made some minor tweaks. Finally we ran the workflow and analyzed the results to see that this model did not offer better results then our third decision tree. In our next lesson, we'll deploy two new models and see if their results are any better.