Sign in or start a free trial to avail of this feature.
10. Lists and Bins
Lists and bins provide convenient ways of combining your data into groups that are easy to analyze. We’ll create groups and use them in charts in this lesson.
- Grouping allows you to divide a field into categories
- Groups can be useful when you have a continuous field taking on a wide range of values, which you want to reduce to a small number of discrete categories
- A list is a series of groups created manually
- Each group is formed by adding values to it one-by-one
- For this reason, lists are best used for a field with a smaller number of values
- Bins divide a field into different categories automatically, based on the structure of the field
- You can set the size of each bin, or set the number of bins in the dataset
- This is more useful for a field with a large number of values
In this lesson we'll look at two methods of creating groups in your data: Lists and bins.
These provide a simple way of dividing your data into categories.
As an example, we've seen the date hierarchy contains year, quarter, month, and day.
This is useful, but not entirely comprehensive.
For instance, we may want to analyze our data by week.
This is a task that we could accomplish using tax formulas. But bins provide a simpler, code-free way of achieving the same result.
We'll start by creating a group with the date field.
We'll navigate to the field list, right click on the date field, then select new group.
The right side of the group window contains the details of the field we've chosen, as well as it's min and max values.
On the left we can set the details of our group.
Note that there are two different types of group: Lists and bins.
Let's start by looking at lists.
Lists let you manually create groups based on all values contained in the specified field.
As you can see, we have a list of all the dates in our data set, on the left side of the window.
We can control+click to select a number of observations, and then group them by clicking group.
This can be useful if your field has a relatively small number of values and a small number of groups.
For example, given state level data, we coulda used lists to create the subregion and region fields had they not been part of our data set.
However, with hundreds of dates in our data set, lists are not very practical.
Instead, we'll change the group type to bins.
Bins automatically divide the data into categories according to our high level specifications. The bin type field lays out two different options for creating bins. The first option is to set the bin size.
In this case, each bin will represent a specific length of time. In this scenario the number of sales transaction in each bin can vary.
The alternative is to set the number of bins.
This will divide the data into a specified number of bins, each containing an equal number of transactions. In this case, each bin could represent a different length of time. We want each bin to represent one week of sales transactions so we'll set the type to size. The length to seven days. And click okay.
The bins appear as a new field in our fields pane called date bins.
Note the interlocking square symbol, which denotes that the field represents a group. We'll double click the title of this field and rename it weeks.
Let's create a line chart of revenue by weeks.
And see that each point on the line represents one week's revenue.
Let's look at another example. We'd like to get an idea of the size of our clients. Do most companies have many users? Or a small number of users? We'll create a clustered column chart to get a better idea of the answer to this question.
We'll then add users to the axis well.
And company name to the values well.
The resulting chart shows the number of users on the X-axis and the number of companies with those users on the Y-axis.
However, because there's a wide range in the number of users this chart is difficult to interpret.
We can solve this problem by creating bins with 100 users each. We'll navigate to the field list, right click on users, and select new group. We'll then name the group Users 100.
Ensure that the group type is bins.
The bin type is size of bins.
Set the bin size to 100.
And click okay.
We'll then select our column chart, remove users from the axis area, and replace it with Users 100. Now the chart has fewer columns, and is much easier to understand. For example, the first column tells us that there are 111 companies who have between zero and 100 users.
It appears that there is no particular trend up to a company size of about 800 users. However, the number of companies with more than 800 users seems to fall precipitously. As we can see, bins are helpful in analyzing a field such as users that would have too many values to look at otherwise. Bins and lists can be useful in any situation where you want to divide a continuous variable into a discrete number of groups.
In the next lesson we'll continue our review of the various visualization types, and look at waterfall charts.