Sign in or start a free trial to avail of this feature.
1. Introduction to Market Basket Analysis
In the first half of this course, we will focus on Market Basket Analysis. In this lesson, we will learn about the concepts and terminology associated with this topic.
To explore more Kubicle data literacy subjects, please refer to our full library.
Lesson Goal (00:48)
The goal of this lesson is to introduce the concept of market basket analysis. Market basket analysis is an example of an unsupervised learning technique. Unsupervised learning techniques do not require training, but cannot be used to make predictions. They are generally used when we want to find relationships and patterns in a dataset.
Understanding Market Basket Analysis (00:56)
Market basket analysis is a technique used to look for an affinity or association between different products in each basket. It’s commonly used in retail environments, such as supermarkets. In this context, a basket refers to a set of products purchased together.
The products contained in each receipt are analyzed and broken into groups, or item sets. Each item set is divided into rules based on the direction of the association. A rule indicates that if one product in the item set is bought, another product in the item set will also be bought. As a result, the likelihood of a rule can vary depending on which item is considered the base product.
Rules are evaluated based on three metrics. Support takes the number of transactions where an item set exists and expresses this as a percentage of the total number of transactions. Confidence is the probability of a receipt containing any two products. It is the proportion of receipts containing the first product that also contain the second product. Finally, lift is the likelihood of a particular rule occurring compared to our expectation if the items were completely independent.
Theoretically, rules can contain any number of items, but the complexity increases greatly when we add multiple items to a rule. As a result, our analysis will only contain rules involving a small number of products.
Course Case Study (03:16)
In this course, we consider the example of Cut Price Supermarkets. Our objective is to analyze three months of sales data and use market basket analysis to better target specific customer segments.
In this course, we'll look at Market Basket Analysis and clustering techniques.
These tools use what's called unsupervised learning.
Unlike supervised learning techniques, such as regression and classification, unsupervised models don't require any training.
However, they cannot be used to make predictions to the same degree as supervised learning techniques.
Unsupervised learning is well suited for finding relationships and patterns within our data.
In the first few lessons of this course, we'll focus on Market Basket Analysis.
In the remaining lessons, we'll look at a clustering technique called K-means clustering.
In this lesson, we'll introduce the concept of Market Basket Analysis as well as our case study.
Market Basket Analysis has a wide variety of uses, but has been most extensively adopted within the retail space. Best illustrated by the example of a supermarket.
Supermarket managers typically refer to each receipt as a basket, and we'll use data mining techniques to look for affinity or association between the different products in each basket.
As a result, this technique is generally known as Market Basket Analysis.
The underlying principle is relatively straightforward.
The products contained in each receipt are analyzed and broken down into product groupings or item sets.
These item sets are then divided into rules based on the direction of the association.
For example, there are two different rules for an item set of wine and chocolate.
The association of wine to chocolate and the association of chocolate to wine.
Since the store has so many products, the likelihood of a wine chocolate combination will differ depending on which item is considered the base product.
The relative merit of each rule is judged according to three metrics.
The first, support. Looks at the number of transactions where a specific item set exists and expresses this as a percentage of the total number of transactions.
The second metric is confidence, or the probability that a receipt contains any two products.
This is calculated by looking at all the receipts that contain the first or left-hand side item, and then counting what proportion of those receipts contain the second or right-hand side item.
The third metric is lift.
This is the likelihood of any specific rule to occur in a data set versus what one would expect if the items were strictly independent.
Theoretically, Market Basket Rules can contain numerous items. However, as a number of items increases, the permutations increased exponentially.
In most real world scenarios. The level of support for rules containing more than four or five items tends to fall to a very low level.
That being said, partitioning the data according to factors like gender or day of the week, it's a practical approach to introducing qualifications without dramatically increasing the processing time.
Over the next few lessons, we'll analyze three months of sales data from Cut Price Supermarket.
They would like us to perform a Market Basket Analysis on the status set so they can better target specific customer segments.
We'll start our analysis in the next lesson by applying the Market Basket Affinity tool and determining item Co-occurrence.