Sign in or start a free trial to avail of this feature.
1. Introduction to Predictive Modelling
This lesson lays the foundation for the predictive modelling course, setting out the business need and making some fundamental changes to the dataset to facilitate onward analysis.
Understanding Classification Models (00:04)
Classification is a form of supervised learning, meaning it trains on existing data to build a model to make predictions. Classification aims to identify to which of several possible categories a new observation of data belongs. As a result, the output of a classification model is categorical, or discrete. This is different from linear regression, which provides a continuous output. However, logistic regression provides a discrete, binary output, meaning it is technically a form of classification.
Alteryx provides a range of classification techniques, including Decision Trees, Boosted Models, Forest Models, Neural Networks, and Naive Bayes.
Overview of the Case Study (01:24)
The dataset for this course relates to research grant applications at a university. We want to identify the factors that improve the quality of an application so that applications unlikely to succeed are flagged in advance
Reviewing the Data in Alteryx (02:07)
The dataset contains information on almost 9,000 grant applications over a 4-year period. For each application, the data includes background information on the applicant, details of the size and type of grant applied for, and whether the application was granted.
In our workflow, we have taken several steps to prepare the dataset before we start analyzing it. First, we use an Autofield tool to assign types for each field in the dataset. Next, we use a Select tool to manually adjust the field type for certain fields where the Autofield tool selected the wrong type. Third, we use a Formula tool to provide values for null data in various fields.