13. Market Basket Inspection

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Now that we’ve developed rules, we can run an association analysis on the Cut Price Supermarkets’ receipt data. In this lesson, we’ll learn how to run this analysis with the Market Basket Inspect tool.

Lesson Notes

MB Inspect Tool

  • The MB Inspect tool takes in the Market Basket Rules, and analyzes those rules
  • This tool outputs Left Hand Side items, Right Hand Side items, support, confidence, and lift for each rule

Preparing Data for Export

  • In order to visualize this data in Tableau, we need to export the dataset to a CSV file
  • We need to merge the data streams into one dataset using a Union tool
  • Category 1 and Category 2 level information is brought back into the dataset before export

Transcript

In the previous lesson, we used the MB rules tool to create a large, but manageable set of rule for a data set. Our goal in this lesson is to use those rules to run an association analysis and prepare our data for export to Tableau. We'll achieve this goal through two key steps. Our first step is to conduct an association analysis by connecting an MB inspect tool to all work streams. Next, we'll prepare for export to Tableau by merging the work streams and reorganizing the data. We'll being by bringing an MB inspect tool onto the canvas and connecting it to our first work stream.

In the configuration window we can see that we again are presented with a number of choices which will allow us to filter down the results.

In this case, we're mainly concerned with confidence and lift since these symmetrics give us the best sense of item association. We'll set the minimum confidence to 0.3. This means that we only want to show item sets that have a likelihood of 30% or more. We'll also set the minimum lift to 1.5. A lift of one, means that items are completely independent of each other. So 1.5 will ensure that we only see results where items in a rule have at least a small amount of association.

Finally, we'll choose to sort the rules by lift, as we're mainly concerned with the item association within rules. Next, we'll add a formula tool to label our data as we've done before.

We'll create a field for gender and a field for day.

Off camera, we'll copy the MB inspect and formula tools, attach them to the other work streams, and run the work flow.

If we look at the output for the first work stream, we see that we're now presented with two principle columns; LHS and RHS.

Note that many of the LHS entries contain numerous items. These are the items we use to predict what is in the RHS column.

This prediction is shown in terms of the confidence figure displayed together with the support and lift metrics.

For example, we can see that female shoppers on weekdays who purchase fresh milk, mixes, pour over sauces, onions, shallots, are 53% likely to purchase mushrooms. This has a lift of 13, indicating a high level association.

The support figure tells us that just over 2% of all receipts contain this item set. We're now ready to move on to the next step, and prepare this data for export to Tableau. To accomplish this, we'll merge the work streams and reorganize the data set so we can easily develop our visualizations. We'll begin by using a union tool to join the four work streams.

The LHS data currently contains up to five items in a single field.

In order to make more use of this data, we want to segregate these fields into their own columns. We can do this by using the text to column tool.

We'll connect the tool, choose to split into five columns, and set the delimiter as a comma.

Next, we'll attach a data cleansing tool and set it to remove punctuation in the RHS and LHS fields to get rid of the curly brackets.

We'll then add a record ID tool to help us keep track of our data, followed by a transpose tool to reorganize the data set. We'll choose record ID, LHS, RHS, support, confidence, lift, gender and day as key fields, LHS one through five as the data fields, and run the work flow.

We can see that each LHS entry has been copied up to five times, with blank values where there are less than five LHS items.

Before we export this data, let's add the category levels back to the data set. We'll bring down another summarize tool, connect it to the select tool at the beginning of the work flow, and make the connection wireless. In the configuration window, we'll group by category one, category two, and category three and run the work flow.

This will take some time to process, so I'll cut the wait time out of this video.

Next, we'll bring down a join tool, connect the main work stream and the summarize tool, and join on the RHS and category three fields.

We'll them rename LHS as LHS Full, value as LHS, and category one, two and three, as RHS underscore category one, two and three respectively.

We'll now bring down another join tool to take care of the LHS values connecting it to the J output node, and again the summarize tool. This time, we'll join on LHS and category three, and rename it category one, two and three, as LHS underscore category one, two and three.

We'll then connect a sort tool, sort by record ID ascending, connect an output data tool, and save the resulting dataset as a CSV file.

Off camera I'll run the work flow to create the output. We'll stop the lesson here. In the next lesson, we'll use Tableau to visualize out association analysis.