11. Naïve Bayes

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

In this lesson you will learn how to deploy the Naïve Bayes probabilistic classifier.

Lesson Notes

Naïve Bayes

  • Naïve Bayes is a machine learning technique that attempts to partition data according to a probability distribution
  • This model assumes that all predictor variables are independent of each other
  • This tool is most useful in cases where the size of the training set is small.

Transcript

In our look at predictive modeling tools available in Alteryx, we've so far incorporated a decision tree model, boosted model, forest model, and neural network.

We'll now incorporate a fifth variety to round off our analysis, the Naive Bayes tool.

Naive Bayes is a machine learning technique that attempts to classify or partition data according to a probability distribution. It's an example of a probabilistic classifier. As the name suggests, probabilistic classifiers attempt to classify or partition the data according to a probability distribution. This technique assumes that all predictor variables are absolutely independent of each other. Hence the name, naive.

In reality, we know that this is often not the case. The Naive Bayes tool can be useful in cases where the size of the training set is small. We'll navigate to the predictive tab on the tools palette and bring a Naive Bayes tool onto the canvas. Again, we'll connect it to the estimation set as signified by the E-node on the create samples tool.

We'll name this model Naive_Bayes, and target Grant_Status.

As we've done previously, we'll choose six predictor variable fields for our analysis. Grant.Category.Code, Contract.Value.Band, With.PHD.1, No.of.Years.in.Uni.at.Time.of.Grant, Number.of.Successful.Grant, and Number.of.Unsuccessful.Grant. We'll now need to connect a score tool, formula tool, and summarize tool to calculate the confusion matrix just as we did with the other models. We'll copy these tools, paste them, and connect the score tool to both the Naive Bayes tool as well as the validation set from the create samples tool. Next, we'll navigate to the formula tool and change the model name to naive_bayes.

We'll now run the workflow to ensure that the confusion matrix is calculated. As with the previous calculations, this can take a bit of time to process, so I'll skip ahead in this video. At this point, we'll navigate to the output node of the summarize tool, and see that we have a single line containing our confusion matrix values. Before we end the lesson, let's put this model in its own container. We'll select these four tools, right-click, and select Add To New Container.

We'll name this container Naive Bayes Model.

Now that we've gone through all these modeling options, we'll compare them in the next lesson to determine the preferred models to deploy on future grants data sets.