4. Determining the Treatment Group

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Using the AB Treatments tool, learn how to decide which sample group should be selected as the treatment group for AB Testing.

Lesson Notes

Determining the Treatment group

  • The AB Treatments tool determines the best group of data to manipulate as part of our AB test
  • This tool compares the group average of user-defined variable to the average across the entire dataset to determine which group best represents the qualities of the dataset as a whole
  • In order to use this tool, each item in the dataset must be assigned a specific group
  • The object ID or identifier used for this tool must be in the string data type
  • Users must select 2-5 numeric values to compare

Transcript

In the previous lesson, we divided our stores into sample groups with the help of the high level data set. Our goal in this lesson is to decide which of the sample groups we should select as our treatment group. We'll achieve this goal through three key steps. First, we'll join our workstreams to combine our high level grouping information with our store transaction data. Next, we'll run the AB treatments tool to determine the ideal test group. Finally, we'll filter the work flow for our selected group in preparation for our analysis. We'll start by joining our workstreams. We'll bring another join tool on to the canvas, connecting it to the output of the title tool, and the J node of the join tool in the historic sales data container.

We'll join on store and run the workflow.

We now have a data set that combines daily sales information and high level information for every store, save the outliers. Note that this information is still at the daily level. For the purposes of our analysis, we would like to aggregate this data to the weekly level. We'll bring a summarize tool onto the canvas, and connect it to the J output node of the latest join tool. We'll then specify to group by store, tile number, format, and week date.

We'll also sum the data for open, in store promotion, receipts, and sales.

This brings forward the important store identifier information, such as store, group number, and format, as well as the transaction information, such as week and sales.

We'll relabel tile number as sample, underscore, group.

We'll then run the workflow. We're now ready for the next step, and to run our AB analysis.

We'll navigate to the AB testing tab on the tools palette, bring an AB treatments tool onto the canvas, and connect it to the output from the summarize tool. We'll then navigate to the configuration window for the AB treatments tool.

The configuration tab asks for the minimum number of treatment units for a group. We've specified to split our data of 104 stores into five groups, so we can be confident that there will be approximately 20 stores in each group. However, it is possible that some groups have a smaller number of stores, since we decided to group by store format. The option here allows us to filter our NE such row groups. alteryx suggests a minimum of ten units in a group as a good threshold for statistical significance. We'll accept that and leave the setting unchanged. Let's move on to the data input tab. This tab allows us to specify which fields we wish to consider when looking for the group that fits best. Here, we need to identify the object ID, or the items we're grouping, the field that contains our groups, and the appropriate variables that the tool should use to find the best treatment group. Notice that the object ID field must be a string. This is another good reminder to ensure that your fields are correctly formatted. We'll specify store as the object ID, sample group as group, sum open as variable one, sum in store promotion as variable two, sum receipts as variable three, and sum sales as variable four.

There's an option to include a fifth variable. However we'll stick with four variables for now. Next, we'll add our browsers to the AB treatments tool, and run the workflow.

Alteryx has calculated various metrics with respect to the variables for each group. These are then indexed, and each group is compared. Let's navigate to the browse window for the report output.

Here, we can see the summary statistics. There are five groups which contain between 19 and 23 stores each.

Group two is deemed to be the best fit, followed by group one. We'll therefore make group two our treatment group. We're now ready to move on to the final step and filter the workflow for our selected group in preparation for our analysis. We'll do this by connecting a filter tool to the summarize tool, and specifying a basic filter for sample group equal to two.

We'll then bring down another summarize tool, and connect it to the true output node from the filter tool. We'll use this to bring forward only the store and sample group information. This data is necessary for determining the control stores in the next lesson. We'll group by store, and sample group, and run the workflow.

We now have a complete list of receipt data for the 20 stores that we suggest be used for the trial. Let's stop here and quickly run through what we did in this lesson. First, we joined our workstreams to combine our high level grouping information with our store transaction data. Next, we ran the AB treatments tool to determine the ideal test group. Finally, we filtered the workflow for our selected group in preparation for our analysis. In the next lesson, we'll prepare for our analysis of the trial results.