2. Understanding Decision Tree Models

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Decision tree models are the foundation for a number of predictive models in Alteryx. In this lesson you will learn what a decision tree model is and how it can yield predictive results.

Lesson Notes

Decision Trees

  • Decision trees determine a target variable based on a set of predictor variables
  • The tool splits the data into smaller set to develop unique pools

Transcript

In this lesson we'll begin utilizing some of the predictive tools available on Alteryx. So far, we've introduced the grant application data set and cleaned it to make it suitable for predictive analysis.

We're now ready to deploy our first predictive models, based on decision trees, but before we do we want to make sure that you understand the role these models can play in segmenting data. Decision trees use an algorithm which attempts to determine a target variable by classifying your data according to a number of predictor variables. Let's consider a simple example. Imagine we had a dataset of voter preferences and we want to model the percentage who favor each candidate. We have certain demographic information in our dataset. We could use this to deploy a decision tree and split the group by gender. We could then split each of these branches by income group. Let's say our data also contains information regarding education. Armed with this, we could further split our groups according to the highest level of education achieved. The idea is that we split the data until we have unique pools. For example, perhaps males in the low income group with only a basic education favor candidate A. Or maybe females from the high income group with a degree favor candidate B. Of course, in the real world, it can be difficult to partition your data into unique groups quite so easily. In reality, you'll likely be dealing with proportions. For example, of the 49 people in our dataset that are classified as male, low income, with a secondary level education, 43 favor candidate A.

It could be that splitting this group further will get to a unique pool. For example, we could split by home owners versus renters.

In Alteryx, the algorithm will attempt to partition the data until such time as no further information is gained. Now that you have a basic understanding of decision tree models, we'll stop the lesson here. In the next lesson, we'll add a decision tree model to our workflow.