12. Market Basket Rules

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Over the last three lessons, we analyzed and visualized item co-occurrence. Cut Price Supermarkets would now like us to run a similar analysis to for receipts that contain more than two items. As a first step, we’ll apply the Market Basket Rules tool to the receipt dataset.

Lesson Notes

Generate Rows Tool

  • The Generate Rows tool inserts new rows into the dataset
  • We use the Generate Rows tool to ensure that each duplicate item on a receipt is captured in the dataset

MB Rules Tool

  • The MB Rules tool creates a set of association rules or itemsets based on the input dataset
  • This tool can use either the Apriori or Equivalence Class methods for creating the rules or items – for more information, please follow these links

Transcript

In the previous lesson we created a co-occurrence lift heat map of the receipt dataset for Cut Price supermarkets. However, Cut Price now wants us to run a similar analysis for receipts that contain more than two elements.

We can do this with a Market Basket Rules and Market Basket Inspect tools.

Our goal in this lesson is to create a set of rules that we can use as a base for a visualization of shopper behavior.

We'll achieve this goal through three key steps.

First, we'll format the workflow to prepare our data for the MB Rules tool.

Next, we'll introduce the MB Rules tool and set our association rules.

As a final step, we'll analyze the output of our MB Rules.

We'll begin our lesson with the Alteryx workflow from the previous lesson.

We'll remove the tools back to our four filter tools.

And run the workflow.

Let's take a look at the data for female customers shopping on a weekday.

If we look closely, we can see that there are instances where we have two or more items from a single category on a receipt. This is because our data is not defined at the SKU level. We're looking for the association between category items so this information is important. We must expand the dataset by listing these duplicate rows.

We can do this by navigating to the preparation tab and inserting a generate rows tool between the select tool and the formula tool earlier in our workflow.

In the configuration window, we'll select to calculate a new field called Basket Item.

We want to generate a new row each time the item's entry is greater than one.

We'll initialize the expression at one, enter the condition Basket Item less than or equal to items, and the loop Basket Item plus one.

We'll then run the workflow.

We're now ready to move on to step two and introduce the MB Rules tool. We'll bring an MB Rules tool onto the canvas and attach it to our first work stream.

The MB Rules tools uses algorithms to create a set of association rules for sets of frequent items.

These rules can then be used to drive the association analysis that is done by the MB Inspect tool.

We now need to specify the data input structure of our MB Rules tool.

We have one item per record, so we'll stick with that selection.

We must specify the transaction key, in our case, receipt ID, and the item identifier, which is category level three.

Next, we'll move onto the method for underlying algorithm used to create the set of association rules.

The options here are a priori and equivalence class.

In most cases, we'll use a priori, as this offers a choice of association rules.

That is to say, it will create our list of item sets.

The equivalence class returns results such as frequency of different item sets, closed sets, et cetera.

To lean more about these options, please follow the links in the lesson notes.

At the bottom of the configuration window, we're offered various filter options.

Note that you'll probably have to go through some trial and error with these settings when analyzing a new data set.

You want to achieve a sufficiently high level of support but you're also looking for a sufficiently large rule set to deploy across the data.

If you choose a level of support that's too high, you may not get any results.

We'll put the minimum support setting at 0.002.

We'll also specify the minimum level of confidence at 50%.

We'll then add a browse tool to the report node and run the workflow.

We're now ready to move on to the final step and consider the output of this tool.

If we open up the report in a new window, we can see that the transaction summary data is presented in 10 panels.

The work stream we're analyzing consists of some 19 thousand rows with the most frequent purchased items from the SweetsChocolates category and FreshMilk category.

People clearly visit Cut Price supermarkets for snacks and staples.

The next panel, details how many receipts contained one item, two items, three items, and so forth.

Further down the page, we have a summary of the rules that have been created.

We can see that our filter has yielded 2,523 rules with a majority containing three and four item sets.

These settings look appropriate.

Off camera we'll copy this tool, attach it to the other three work streams, and run the workflow.

Let's stop the lesson here. In the next lesson, we'll use the Market Basket Inspect tool to run an association analysis on our receipt data.