Sign in or start a free trial to avail of this feature.
3. Data Aggregation
In this lesson we will prepare a workflow suitable for time series analysis. We’ll start by aggregating our data into the correct time intervals.
- The first step preparing for a time series forecast is determining the appropriate period length given the dataset and forecast parameters
- Our hourly marketing data needs to be aggregated to the weekly level so that our forecast looking forward one year has a manageable 52 periods
- The Date Filter tool allows users to limit data going forward in a workflow to a specific date range
In the previous lessons, we introduced the concept of time series forecasting at a high level. We'll now investigate the time series tools offered by Alteryx as we work through an example using our online marketing dataset. Through this analysis, we'll perform a Univariate Forecast. This means that our forecast will be based on a single variable. In this case, the sales figures. In this lesson, our goal is to prep the dataset for our time series analysis. We'll accomplish this by following three key steps. First, we'll scrub the dataset of any null values. Next, we'll aggregate our data to a weekly format.
Finally, we'll specify starting and ending dates as a base for our forecast. Time series models require that all records are populated. As such, our first step is to add a Browse Tool and determine if the dataset contains any null values. We'll run the Work Flow, and see that the dataset does indeed contain nulls. We know that when it comes to sales, null values are actually zero. So, we can replace the nulls with zeroes using a Data Cleansing tool.
We'll connect the Data Cleansing tool and select the 'Replace Nulls with Zeroes' option. Next, we need to consider the level of aggregation for our time series analysis. Let's say, we would like to create our budget for the year ahead. Our data is currently in an hourly format, which would provide almost 9000 periods. Weekly data would be more appropriate. So, we'll need to aggregate the data accordingly. First, we'll connect the Select tool and convert the date field from the Date Time format to simply the Date format. This will make our aggregation a bit simpler as it ignores the time element. We'll also convert our Sales data to Fixed Decimal size 12.2, as these numbers should only contain two decimal places. Next, we'll add a Formula tool to define each week as starting on a Monday. As such, we'll name the new column 'Week.' There are various ways to do this; however, we'll use a Switch function to assign a number to each weekday and then use a Date Time add formula to return the date corresponding to the Monday of that week. We'll then change the data type to Date and connect a Summarize tool to the workflow.
We'll group by the 'Week' column.
Sum by the 'Sales' column.
And rename that output, 'Weekly_Sales.' As a final bit of preparation, we'll navigate to the Preparation tab and connect a Date filter tool to the workflow.
Time series analysis is all about time, so having control of the beginning and end of our analysis period gives us more flexibility.
We'll specify a start date as Monday, January 6, 2014 and an end date of Monday, December 26, 2016.
Now that we've aggregated our data into weekly sales figures, we'll stop the lesson here. Let's quickly recap what we've done in this lesson. First, we removed all null values from the dataset.
Next, we aggregated our data from an hourly format to a more manageable weekly format. Finally, we specified starting and ending dates as a base for our forecast. In the next lesson, we'll continue our preparation by preparing a time series plot.