Sign in or start a free trial to avail of this feature.
1. Introduction to Predictive Modeling
This lesson lays the foundation for the predictive modelling course, setting out the business need and making some fundamental changes to the dataset to facilitate onward analysis.
- We’ll look at 5 different types of predictive models through this course:
- Decision Trees
- Boosted Models
- Forest Models
- Neural Networks
- Naïve Bayes
- Null values may make results difficult to interpret, so they should be removed or given a value before running predictive analysis
- Data Investigation tools can help determine notable trends, features, or anomalies in your dataset
In the following series of lessons, we're going to take a deeper look at the predictive modeling tools available in Alteryx. Previously, we considered linear and logistic regression analysis, but Alteryx provides a range of alternative techniques including Decision Trees, Boosted Models, Forest Model, Neural Network, and Naive Bayes.
Over the course of the following lessons, We'll investigate each of these techniques in turn. We'll then consider how to compare and then deploy the preferred predictive models. For these lessons, we're going to use a data set of university grant applications. Imagine we work for the grants department at a prominent university, and review applications for research grants each year. More than half of the grant applications fail. This is not only a waste of our time, but it also represents a significant amount of wasted effort on the part of the academics seeking funding. Our goal in this course is to improve the grant application quality, so that applications unlikely to succeed are flagged in advance. This would not only reduce a burden on the grants department but also help the university guide applicants so that less time is wasted on submissions unlikely to succeed. We'll begin by reviewing the grant application data set. We can see that it contains information relating to almost 9,000 grant applications over a four-year period.
We have information regarding whether a grant was awarded or not, together with background data about the applicant, the size and type of grant requested, et cetera. We're going to use the Alteryx predictive tools to consider this information and predict the likelihood that an application is awarded a grant. With most data projects, you'll want to start by preparing your data.
Fortunately, our data has already been cleansed, so I'll quickly run through what these tools have accomplished.
First, the auto field tool, was used to automatically select the data type for each field in our data set. We then use the selection tool to fix any instances where the auto field tool applied the wrong data type. We then had to deal with null values. If we go to the browse tool and consider the Grant.Category.Code field, we can see that that field has almost 1,000 null values.
Indeed, many of the string fields in this data set contain null values. Leaving these fields blank might've made the results more difficult to interpret. To account for this, many conditional formulas were used to fill these blank fields. I'm not going to run through each of these formulas right now, but it may be useful to go through them on your own time. One final note: in many real-world situations, it may be useful to deploy tools from the Data Investigation Tab to analyze the data set for notable features or anomalies. In this course, we'll only use the field summary tool, but it may be worth looking into some of the other tools in your own time. In the next lesson, we'll run through an explanation of the Decision Tree Tool. The first model will deploy in this course.