10. Cohort Analysis

Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial


Cohort analysis breaks customers up into groups that share a common characteristic and compares these groups to each other. In this lesson, we’ll analyze cohorts of customers who signed up in particular months, and compare churn rates for these cohorts.


Cohort Analysis

  • Cohort analysis divides a dataset into groups by some common characteristic
  • These groups, or cohorts, can then be analyzed and compared
  • This identifies how the characteristic affects some outcome of interest
  • Here we divide the customers into cohorts according to the month they signed up in
  • We then compare the churn rates of these cohorts over time

Results of this Analysis

  • We analyze the churn rates for customers who signed up in 2016
  • When analyzing churn rates, we are better off looking at percentages of customers that have churned instead of looking at revenue amounts
  • Churn rates are generally quite low in all cohorts
  • The January and July cohorts seem to have the highest churn rates, while the March cohort has the lowest churn rate



In this lesson we'll conduct a cohort analysis for Interslice.

A cohort analysis breaks the dataset into various groups, or cohorts, which share some common characteristics and analyzes how the different cohorts behave over time. Our goal is to use a matrix to analyze customer sign-up cohorts and determine when the customers in these cohorts churn.

We'll start by creating a matrix and placing it across the bottom half of the canvas.

We'll then navigate to the customers table, add sign-up date to the rose well, navigate to the transactions table, add tenure month to the columns well and MRR to the values well.

This produces a matrix showing revenue figures. We can use this matrix to track the amount of revenue obtained from any sign-up cohort over time. To make the numbers easier to read we'll navigate to the formatting section of the visualizations pane, select grid, and increase the text size to 12.

Let's take a moment to explain how this matrix works. Remember, our transaction data only incorporates 2016 and 2017. This explains why there are no customers with low tenure figures for 2013 and 2014.

We should also remember that a customer's first transaction is usually on the first day of a month after they sign up. This explains how some customers can sign up in 2015 and have a tenure of one in our dataset.

These customers would have signed up in December 2015 and made their first payment in January 2016. In effect, the cohort for a particular month will actually be made up of people who signed up in the previous month.

Let's expand this matrix down to the monthly level by clicking the two branched arrows twice.

Scrolling down through the matrix, the diagonal line makes it clear which combinations of signup date and tenure fit into our dataset.

In order to make this matrix a bit easier to understand, we'll go to the signup date hierarchy in the fields area and select x to remove quarter and day from the hierarchy.

We're only interested in analyzing monthly signup cohorts so this removes unwanted fields from the date hierarchy. As we can see this is a fairly large matrix. For the purposes of this lesson we'll only look at monthly cohorts of customers who signed up during 2016 and analyze how they churn over 2016 and 2017.

To achieve this we'll add two slicers to our report.

We'll select a slicer from the visualizations pane, place it at the top left corner of the canvas, and add the sign-up date field from the customers table.

We'll then add the second slicer to the top right corner of the canvas and add the tenure month field from the transactions table.

We'll change this slicer from a slider to a list.

We can now focus our matrix to look at customers that signed up during 2016.

We'll set the start date to January 1, 2016, and set the end date to December 31, 2016.

The number of rows in our matrix is much more manageable.

As we've discussed before, we can analyze MRR and customer retention from both a revenue perspective and the customer perspective.

Analyzing customers can be slightly easier to interpret in this matrix, so we'll change this matrix from showing monthly revenue to showing customer numbers instead.

We'll remove MRR from the values well and replace it with customer ID from the transaction table.

We'll then change the aggregation to count distinct.

Now the matrix is easier to understand.

Each row represents a cohort of customers who signed up in a particular month in 2016.

As we travel along the row we can see the number of customers who remain subscribed. The final number in each row represents a quirk in the dataset, so we'll disregard these entries for the purpose of this analysis. At a glance it looks like the churn rate in each cohort is reasonably low.

We want to make it easier to compare the different cohorts.

Currently each month starts out with a different number of customers, so it's not easy to compare the churn and retention rates.

To fix this we'll change the figures from the number of customers to the percentage of customers.

We'll select the arrow by count of customer ID, then show value as and percent of row total.

Now the matrix shows the percentage of the customers in each cohort that are still subscribed at the particular tenure value.

We can now follow any of the rows to understand what percentage of customers churned and when. For example, let's look at the column for the tenure month of 12 representing customers who remain subscribed for a full year.

We can see that in the January cohort 92.97% of customers were still subscribed after 12 months, indicating 7% of this cohort had churned.

By contrast the March cohort had a much lower churn rate with 99.15% of its customers still active after 12 months.

If we follow these rows we can compare churn rates of different cohorts. It seems that the January and July cohorts have the highest churn rates while the March cohort has the lowest.

Notably the churn rate appears to be steady and gradual across the dataset. There's no point at which a large percentage of customers and any cohort suddenly churn. If, for example, we saw a large number of customers churning after 12 months the company could focus retention efforts on customers approaching this point. However, we can see that this is not the case from this particular dataset. Let's stop the lesson here. In the next lesson we'll change our focus to revenue generated from customers looking at the average revenue per user.