8. Decision Trees


Decision trees are an intuitive classification model. This lesson explains how they can be used for prediction and explains the principles behind how they can be created.

To explore more Kubicle data literacy subjects, please refer to our full library.


  1. Lesson Goal (00:13)

    The goal of this lesson is to learn about decision tree models.

  2. Using a Decision Tree (00:20)

    A decision tree provides a series of questions that lead to a classification. To classify a data point, we start at the top, then take a path through various branches of the tree based on the answers to questions about the data. When we reach the end of the tree the data point will be given a classification.

    Each question in a tree is called an internal node. The top internal node is called the root node. Each line extending from a node is called a branch, and each final outcome is a leaf node, which represents a classification.

  3. Creating a Decision Tree (01:51)

    Decision trees are created starting from the root node. We evaluate every possible feature in the data set to find a split that will have the lowest possible misclassification cost. Subsequent splits are created with the same process.

    We then need to consider when to stop adding branches. Adding too many branches would make overfitting more likely, so we don’t want to create a tree that’s too large. We can stop growing a tree when the data set at a node gets small enough. Alternatively, we can limit the number of branches between the root and any leaf node. As another option, we can grow a very large tree, then eliminate, or prune, the branches that least improve the tree’s accuracy.

  4. Advantages and Disadvantages (02:59)

    Decision trees are intuitive and easy to understand. However, they are unstable, as a slight change to the root node can completely alter the entire tree. For this reason, an individual decision tree is often considered a weak classifier.


In the previous lesson, we started looking at the most common classification models you were likely to encounter, focusing on the Naive Bayes model.

In this lesson, we'll learn about decision trees, which are an intuitive model for predicting the classification of new data.

Decision trees are a very visual classification model, so let's take a look at an example. This tree relates to a dataset of passengers on the Titanic.

Our aim is to use information about a passenger to predict whether or not they survived the ship sinking. The decision tree provides a simple sequence of questions that allows us to predict the fate of a passenger.

Let's learn about the terminology of a decision tree. Each question is called an internal node and represents a test of some field in the dataset.

The top internal node is called the root node.

Each line extending from a root node is a branch, representing one possible answer to the test.

Each final outcome represents a leaf node, which is a classification of whether or not a passenger survives. To predict passenger survival, we simply start at the root of the tree and work through it until we reach a leaf node. For example, the first question asks if the passenger is male.

If the answer is no, the tree predicts the passenger survives.

If the answer is yes, we move to the next question and continue until we reach a prediction.

Note that sibsp refers to a field in the dataset that measures the number of spouses or siblings a passenger had on board the ship. As we can see, it's easy to use a decision tree to predict the classification of a new observation.

Let's take a step back and consider how to create trees like this. Decision trees are created from the top down. To determine the ideal test for a root node, we check every possible field in the dataset and various different split points in these fields.

The ideal split here would place only surviving passengers on one side and only dead passengers on the other side. In practice, we're unlikely to find this type of perfect split except possibly at much lower levels of the tree with a much smaller dataset.

As a result, any split will involve some loss of accuracy which we call a cost.

We choose the split that has the minimum cost.

There are various functions that can be used to calculate this cost, but we won't go into the details in this course. When creating subsequent splits, we simply repeat the process of checking every possible feature and split point and choose the least costly.

Remember that after every interior node, the dataset will be reduced in size which changes the optimal field or split point to use for the next interior node.

The next issue to consider is when to stop adding further branches.

If we consider the Titanic tree again, we can see that it predicts all female passengers survive.

Of course, this was not the case in reality.

You might therefore wonder why the tree doesn't have further splits for female passengers like it does for male passengers.

The issue is that a decision tree needs to strike a balance between accuracy and overfitting.

Adding more branches should make the tree's predictions more accurate. However, adding more branches also increases the risk of overfitting.

You may remember that overfitting occurs when a model makes predictions that are too specific to the data it was trained on.

An overfitted model, therefore, doesn't predict new data well enough.

There are several possible methods for deciding when to stop growing a tree. One strategy is to stop if the number of observations in the data reaches a low enough level.

Another option is to limit the number of branches between the root and the leaves of the tree. A final option is to build a large tree and then prune it by eliminating the branches that least improve the tree's accuracy.

Whatever method is used, we should hopefully end up with an intuitive tree that can be used to make good predictions.

We've now learned how to create decision trees.

The main advantage of decision trees is they're easy and intuitive to understand.

Anyone can understand how a decision tree works without needing advanced knowledge of math or statistics.

The main disadvantage of decision trees is that they are unstable.

If the root node of a decision tree changes, then a new observation will probably take a completely different path through the tree.

For this reason, it's not a good idea to rely on a single decision tree as a classifier. To address this, there are several methods that classify things by combining multiple trees.

These methods are known as ensemble methods.

We'll look at one of these methods called boosting in the next lesson.