10. Forest Models and Neural Networks

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

To further Alteryx predictive models are introduced in this lesson.

Lesson Notes

Forest Model

  • The Forest Model generates hundreds of decision trees from random samples, then combines them
  • For more information on this topic, please follow this link

Neural Network

  • Machine learning technique that considers samples of data to develop a test result
  • The test result is compared with the actual result and the model is adjusted accordingly, before being applied to another sample of data
  • The process is repeated until there are no more improvements to the model’s predictive quality
  • The technique is based on the model of a human neuron
  • more information on this topic, please follow this link 

Transcript

In the previous lesson, we introduced a boosted model tool to our grant application dataset. In this lesson, we'll introduce some other modeling tools in our quest to find the most appropriate predictive model for our dataset. To that end, there are three key steps for this lesson. First, we'll introduce a forest model.

Next, we'll introduce a neural network model.

Finally, we'll calculate the confusion matrix values for both of these models. The first model we'll introduce in this lesson is the forest model. The forest model is also known as a random forest, and is another type of decision tree model. In this method, hundreds of decision trees are generated using random samples of the dataset. Each of these models are then combined in order to generate a usable algorithm. We'll connect this tool to the estimation set as signified by the E output node in the create samples tool.

Next, we'll name the model "forest_Model", and target grant status.

As with the previous lesson, we'll choose six predictor variables for our analysis. Grant category code, contract value banned, with PHD one, number of years in university at time of grant one, number of successful grant one, and number of unsuccessful grant one.

We're now ready to move on to the next step of our lesson, and add a neural network model to the workflow. The neural network is a machine learning technique where the algorithm considers a sample of data in terms of inputs and results.

It then considers another sample of data and applies a function in an attempt to achieve a test result. The test result is compared with the actual result, and the model is adjusted accordingly before it's applied to another sample of data. This process continues until no further improvements in the predictive qualities of the function can be achieved. The technique is based on the model of a human neuron, hence the name. We'll navigate to the predictive tab on the tools pallet, and bring down a neural network model onto the canvas. Just as before, we'll connect this model to the estimation set as represented by the E output node of the create samples tool.

We'll name this model "neural_Network", and again target grant status.

Again, we'll choose the same six predictor variable fields. Grant category code, contract value banned, with PHD one, number of years in university at time of grant one, number of successful grant one, and number of unsuccessful grant one. At this point, we're ready to move on to step three, and calculate the confusion matrix values for these models. To do this, we'll need to connect a score tool, formula tool, and summarize tool, just as we did with the other models. We'll copy these tools from a previous model, paste them, and start by connecting the score tool to both the forest model as well as the validation set from the create samples tool.

We'll then navigate to the formula tool, and change the model name to "forest_Model".

We'll now paste another set of these tools, again connecting the score tool to the validation set of the create samples tool, but this time also connecting it to the neural network tool.

We'll again navigate to the formula tool, but this time change the model name to "neural_Network".

We'll now run the workflow to ensure that the confusion matrix is calculated.

As with previous lessons, this can take a bit of time to process, so I'll skip ahead in this video. We'll now navigate to the output nodes of both summarize tools, and see that each has a single line containing the confusion matrix values.

Before we end the lesson, let's put these two models in their own container. We'll start with the forest model. We'll select all four tools, right click, and select add to new container. We'll name this container forest model.

We'll now do the same with the neural network model. We'll select all four tools, right click, select add to new container, and name the container neural network.

In this lesson, we brought two new predictive models onto the canvas, and calculated the confusion matrix values.

In the next lesson, we'll add one more model to the workflow before we compare the results.