14. Deploying the Predictive Models

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

In this lesson you will use a previously saved predictive model and then fitted to a new dataset in order to make better business decisions.

Lesson Notes

Using the Models to Improve the Grant Process

  • All three models are used to predict outcomes in the new dataset
  • These predictions are combined in a single dataset
  • To speed up the grant process, we’ll immediately reject any grant that has less than a 20% chance of success in all three models
  • For more information on machine learning, please follow both this link, this link, and this link
  • For information on Multivariate Adaptive Regression Splines, please follow this link

Transcript

In this course, we create a range of different models using a sample of our grant application data. We then applied those models to a validation set in order to determine each model's accuracy. In the previous lesson, we exported our preferred models to an Alteryx database file so we can use them in a future dataset.

In this lesson, we'll run our preferred models on a new set of grant application information to help us streamline the grant application review process. We'll compare the results and automatically reject any application that has less than a 20 percent chance of approval in all three models.

As a result, the application team will save a fair amount of time as they won't even need to review these applications. We'll accomplish this goal in three key steps. First, we'll run the grant application dataset through each of the preferred models. Next, we'll combine the outputs from the three models, and view the results side by side.

Finally, we'll filter out all grant applications that are unlikely to be approved based on the outputs of our models. Our first step is to run our grant application data through each model. To accomplish this, we'll need to create three separate streams that only apply a single model to the data. We'll start with our new grant application dataset. Note that this data has already been cleansed of null values just as with the original set. In a real life situation, you will most likely need to perform this task before you can apply your models. Just something to keep in mind. In a parallel workstream, we need to connect the Alteryx database containing our top three models. You can find this file in the list of lesson files below the video with the name fitted models. We'll bring down an input data tool, navigate to the database file and connect it. We now need to separate out and run the three models. Starting with the highest performing model. To do this, we'll connect a sample tool and specify the first record.

As we previously sorted the models in the order of accuracy, we know that the first record is the top performing model.

We'll then bring down a score tool and connect it to both the model objects and the grant application dataset.

We'll name the score fields grant_status_rate.

At this point, we'll run the workflow so that we can see the results.

We'll scroll to the right in the results window and see that two new fields have been created. Grant_Status_Rate_0 and Grant_Status_Rate_1. These fields hold predicted values on whether these records will achieve grant approval. With Grant_Status_Rate_0 being negative and Grant_Status_Rate_1 being positive. That is to say, record one has a 52.6 percent chance of being approved and a 47.4 percent chance of being denied.

As we move forward, we're only concerned with the positive results, so we'll connect a select tool and deselect all fields except the grant application ID and Grant_Status_Rate_1.

We'll also rename the Grant_Status_Rate_1 field to champion model. In order to keep our canvas tidy, we'll put these three tools in a container and name it champion model.

We'll now run the dataset through the other two models using a similar setup starting with the second best model.

We'll bring a new sample tool onto the canvas and specify to skip the first record.

This will leave a list of the final two models with the second best model in the top row.

We'll copy the sample, score, and select tools from model one and paste them.

We'll connect them to the sample tool and bring the grant application dataset into the score tool.

We'll now go to the select tool and rename the Grant_Status_Rate_1 field to Alternative A.

Again, we'll put these tools in a container this time naming it Alternative A.

Let's move on to the third model. We'll again bring down another sample tool and connect to the fitted models database.

This time, we'll specify to skip the first two records.

We'll again paste the sample, score, and select tools connecting them to the sample tool we just created.

We'll again bring the grant application dataset into the score tool, navigate to the select tool, and rename the Grant_Status_Rate_1 field to Alternative B.

Again, we'll put these tools into a container and name it Alternative B.

We're now ready to move on to step two and bring the three outputs together.

To that end, we'll bring down a joint multiple tool and connect each of the three model workstreams.

In this case, we must specify to join by the grant application ID field to each of the datasets as that's the only common field. However, we only need to carry forward the grant application ID field once. So we can deselect this field from the second and third inputs below.

At this point, we'll run the workflow so we can see how the three models compare.

We're now presented with a table containing each of our new grant application IDs.

Beside this, we have a predicted value which represents the probability that each application will achieve grant approval.

Note that on certain occasions, three models give a similar probability, but on other occasions, there's a marked difference.

At this point, we're ready to move on to step three and filter out all grant applications that are unlikely to be approved.

How should we go about doing this? And what should be our approval threshold? The university currently receives a significant number of grant applications and we know from previous lessons that just under half of these applications achieve approval. With 50 percent of grants failing, a lot of time is being wasted making detailed applications and scrutinizing those applications before finally rejecting them. A good way to use our modeling results is to establish a quick no policy.

In other words, we'll ask applicants to make a high level application, which contains each of the key fields required by our models. We'll then apply a criteria allowing us to give applications with a low probability of success a quick no. Thereby saving time and resources for both the applicant and the university.

We only want to give a quick no to applications that have a very low chance of success in all three models. We'll set this threshold at 20 percent.

To that end, we'll connect a filter tool to remove these records from our dataset.

We'll select the custom filter radio and enter our formula in the expression box. This expression tells the filter tool to filter applications that have less than a 20 percent chance of approval in all three models.

We'll now connect a browse tool and run the workflow again.

We'll click on the true node and see that 878 applications are predicted to have less than a 20 percent chance of approval. If we click on the original dataset, we can see that there were originally 2,176 records.

We've therefore cut down the workload by some 40 percent.

Let's quickly recap how we got here. First, we ran the grant application dataset through each of the preferred models using a score tool. Each model returned the probability of application approval based on data in the new dataset.

We then combined the outputs from the three models and viewed the results.

Finally, we filtered out all grant applications that were unlikely to be approved based on the outputs of our models.

This brings us to a conclusion on our series of lessons on predictive modeling in Alteryx. As you probably already gathered, this topic incorporates some pretty advanced mathematics and statistics. Students looking to know more about predictive techniques and the models discussed here are advised to read further about the subject by following the links in the lesson notes.