Sign in or start a free trial to avail of this feature.
10. Preparing Data for Export
Before we can visualize item co-occurrence, we need to consolidate our workstreams and format our data for use in Tableau. We’ll cover how to efficiently perform this task, and ensure that we have all the necessary data to create an effective visualization.
Preparing Data for Export
- In order to visualize this data in Tableau, we need to export the dataset to a CSV file
- We need to merge the data streams into one dataset using a Union tool
- Category 1 and Category 2 level information is brought back into the dataset before export
We're helping cut-price supermarkets analyze a sample of their receipt data and determine any insights. In the previous lesson, we deployed the MB Affinity Tool to determine the co-occurrence of each level three category item.
Our goal in this lesson, is to prepare our data set for export to Tableau.
We need to bring our data into a single data set. Before we can do that, we need to convert each work stream into a tabular format.
We'll start, by connecting a transpose tool to the first work stream.
We'll select row names as the key field, and all other entries as the data fields.
We want to make sure that data from this work stream is identifiable in the merged data set, so we'll connect the formula tool to create gender and day labels.
We'll call the first field, gender, and enter the text female in quotes.
We'll then create a second formula canvas, call it day, and enter the text, Monday through Thursday, in quotes.
We now need to repeat these steps for the other work streams and run the work flow.
I'll do that off camera.
We're now ready to bring these work streams together. We'll bring down a union tool, and connect all the work streams.
We'll also bring down a select tool, so we can give the fields more helpful names.
We'll rename row names as item one, name as item two, value as occurrence, and run the work flow again.
If we click on the output, we can see that each of our co-occurring pairs are now listed into columns. However, data regarding higher level categorization is missing.
These higher level categories will be helpful for our visualization. We can incorporate this information into our data set with a summarize and two joins. We'll bring a summarize tool onto the canvas, and connect this to our day of week formula at the beginning of the work flow to bring forward the original category definitions.
We'll group by category one, category two, and category three, and run the work flow again.
We'll then bring down a join tool, connect the summarize and select tools, and join on item one and category three.
We'll also take this opportunity to rename categories one and two to reflect the fact that they're tied to item one.
We'll rename category one to category one, item one, and category two to category two, item one.
We must now take care of item two.
We'll bring the second join tool onto the canvas, and connect it to our summarize tool, as well as the J output node from the previous join.
This time, we'll join on item two and category three, rename category two as category two, item two, and category one to category one, item two.
Remember, we're only concerned with instances where items were purchased together, so we'll want to filter out the single receipt data. We'll connect a filter tool, and create a custom filter where item one, is equal to item two.
We'll then connect a data output tool to the false node, specify a destination folder, name the file market basket output, ensure the file type is a .csv, and run the work flow to create our merged data set.
We'll stop the lesson here.
In the next lesson, we'll input our data to Tableau, and create a heat map visualizing item co-occurrence.