Sign in or start a free trial to avail of this feature.
8. Filters and Summarize
In this tool lesson, you will learn how to use the filter tool to segregate your records. You will also be introduced to the summarize tool for data aggregation.
- The Filter tool splits data into two separate sets based on user criteria, True and False
- The True set consists of all data that match user criteria
- The False set consists of all data that does not match user criteria
- The Summarize tool aggregates data using a variety of actions, such as Group By, Average, Sum, Count, etc.
- The fields that are viewable as a result of the Summarize action are the only fields that will be brought forward in the workflow
In the previous lessons, we've taken our data set through initial preparation steps including organising data with a select tool and attaching unique identifiers with a record ID tool.
In this lesson, we'll use the filter tool to segregate our data and then consolidate the results using the summarise tool.
Continuing with the sales dataset we've been using, let's begin with a simple example of filtering.
For the analysis I want to conduct I only want to view larger sales of 50 or greater by product.
How can we do this in Alteryx? We'll navigate to the preparation tab and connect the filter tool to our workflow.
In the configuration window, we can see options to create basic or custom filters.
The custom options allow users to create filters using specific formulas or code, such as if/then conditional functions.
No need to worry about this now; we'll cover formulas and custom filters in later lessons.
In this case, we're using a simple greater than or equal to function so we'll stick with a basic filter.
We'll first click the field drop down and choose the sales option and then click the next dropdown to select our limiting function.
Here, we can see a variety of simple functions such as equal to or greater than.
Notice the is null and is not null functions at the bottom of the list.
These can be particularly useful when cleaning data.
For example, we could use the is null filter to pull out records that are missing information such as customer numbers or postal code.
Since we're looking for sales greater than or equal to 50, we'll select the greater than or equal to function and type in 50 in the final box on the right.
We'll now run the tool.
The results window now shows all records with sales entries of 50 or more.
On the left side of the window we can see two arrows, one with T and one with an F.
Saying for true and false.
The T is selected, meaning that the window is currently showing all data that meeet our filter criteria.
If we click the F, we can see that the window now shows all sales entries less than 50.
The T and F arrows are mimicked in nodes on the workflow allowing us to further analyse either set of data.
Now that we have separated out our data, we would like to know the average and total count of sales for each product line.
We can use the summarise tool to aggregate this information.
We'll navigate to the transform tab and connect the summarize icon to the true output node from the filter.
This will limit our analysis to all sales of 50 or more.
In the configuration window, we can see that are fields are listed in a box on the top with a blank box below labeled actions.
We'll need to select the fields that we want to summarize and click the add button to choose the analysis.
Since we want to filter our results by product, we'll select the product name field, click the add button and select group by.
Now we'll select the sales field and for action choose numeric and then average.
This will give us the average sales price for each product.
Finally, we want to see the total count of sales for each product line.
The record ID field is helpful here as each sales record is uniquely identified.
We'll select the record ID field and for action, choose count.
We'll now press CTRL+R to run the workflow.
In the results window we can see the 4 project lines listed along with the average sales price and total kind of sales for each product line.
There are 2 important things to know about this result.
First, only the visible fields will be brought forward for any tools added to the workflow.
Second, this result only covers entries where the sales price was greater than or equal to 50.
To see a similar analysis for the rest of the data in the data set, we'll need to connect a new summarize icon to the F node on the filter tool.
As we can see, these two tools work very well together.
The filter tool allows users to easily split their data into two distinct sets based on either basic or more complex criteria.
This allows users to focus on relevant data rather than an entire dataset.
Analysts can then use the summarize tool to manipulate raw data and conduct a deeper analysis.
In the next lesson, we'll take a look at the different types of messages that appear in the results window.