Sign in or start a free trial to avail of this feature.
8. Introduction to Market Basket Analysis
In the second half of this course, we will focus on Market Basket Analysis. In this lesson, we will learn about the concepts and terminology associated with this topic.
Itemsets and Rules
- An Itemset is any combination of items that may appear together in a transaction, regardless or order
- A Rule is a specifically ordered itemset
- For example, an itemset of milk and cookies may have different levels of association if you analyze the set with milk as the base than with cookies as the base item
- Support is the fraction of transactions that contain an item or itemset
- Support = (Item 1 + Item 2)/all transactions
- Confidence is the likelihood that the second item in a rule will appear in all transactions that contain the first item of the rule
- In a rule of milk and cookies, confidence is the number of transactions where milk and cookies appear together as a percentage of all transactions that have milk
- Confidence = (Item 1 + Item 2)/Item 1
- Lift is the likely of any specific rule, or ordered item set, to occur, versus what you would expect if the items were independent
- The math here is a bit more complicated, so please follow this link for further information
In the previous seven lessons we looked at AB testing and how it can be used to inform business decisions.
In the remaining lessons, we'll look at Market Basket Analysis.
Market Basket Analysis has a wide variety of uses, but has been most extensively adopted within the retail space, best illustrated by the example of a supermarket.
Supermarket managers typically refer to each receipt as a basket and will use data mining techniques to look for affinity or association between the different products in each basket. As a result, this technique is generally known as Market Basket Analysis.
The underlying principle is relatively straightforward.
The products contained in each receipt are analyzed and broken down into product groupings or item sets.
These item sets are then divided into rules based on the direction of the association.
For example, there are two different rules for an item set of wine and chocolate.
The association of wine to chocolate and the association of chocolate to wine.
Since a store has so many products, the likelihood of a wine chocolate combination will differ depending on which item is considered the base product.
The relative merit of each rule is judged according to three metrics.
The first, support, looks at the number of transactions where a specific item set exists and expresses this as a percentage of the total number of transactions.
The second metric is confidence or the probability that a receipt contains any two products.
This is calculated by looking at all the receipts that contain the first, or left-hand side item, and then counting what proportion of those receipts contain the second, or right-hand side item.
The third metric is lift.
This is the likelihood of any specific rule to occur in a dataset versus what one would expect if the items were strictly independent.
Theoretically, market basket rules can contain numerous items, however as the number of items increases, the permutations increase exponentially.
In most real world scenarios, the level of support for rules containing more than four or five items tends to fall to a very low level.
That being said, partitioning the data according to factors like gender or day of the week is a practical approach to introducing qualifications without dramatically increasing the processing time.
Over the next few lessons we'll analyze three months of sales data from Cut Price Supermarket.
They would like us to perform a Market Basket Analysis on this dataset so they can better target specific customer segments.
We'll start our analysis in the next lesson by applying the Market Basket Affinity tool and determining item co-occurrence.