Sign in or start a free trial to avail of this feature.
10. Time Series Model Customization
In this lesson, we will learn the difference between a standard and a customized Alteryx time series model. Possible use cases for customization are discussed, together with the potential downside, including overfitting.
- Seasonal differencing attempts to remove seasonal trends from data to create a stationary data series
- This is accomplished by taking the current values and subtracting the corresponding value in the previous season
- If the dataset continues to exhibit a trend, further levels of differencing need to be applied
- Overfitting occurs when a model is customized to a level that fits historic data too well
- This can result in a model that explains historic data, but does a poor job of forecasting future performance
- Users can test for overfitting by running the model through a validation set of data
So far, we've used the Alteryx default settings for our forecasts. However, now that you have a better understanding of the various time series tools, we'll discuss the subject of customization. At its basic level, the concept of time series forecast is relatively easy to grasp. That being said, it can quickly get complicated. For this reason, randomly tweaking the time series customization settings is inadvisable unless you really know your statistics. However, there are some very basic changes you can make to your forecasts that may have a transformative effect. In this lesson, we'll examine some of these changes and discuss their possible merits over the out-of-the-box models. For our preliminary time series lessons, we use the online sales dataset and aggregated our data to the weekly level.
We set the start date to January 6th, 2014, and the end date of December 26th, 2016. This gave us a forecast which maintained some of the obvious seasonality in our data. However, it's worth noting that, as the dataset concludes on December 30th, the week beginning Monday 26th does not include seven days of data. In this case, it would have been preferable to choose the end date as Monday, December 19th, as it would have given us a complete seven-day week. However, as we can see from the forecast at the bottom of the slide, selecting an end date of December 19th produces a forecast of the main value. This difference is due to how the different sets of data see the Arima algorithm. When this happens, you can try to choose different end dates for your time series calculation. However, a preferable course of action is to consider some of the customize options in the Arima tool. If we switch to Alteryx, you can see that we've created a workflow with three pink models and three blue models. The workflow is quite large, so it may be easier to navigate with the overview pane. The pink models are driven by the dataset concluding December 26th, whereas the blue models are driven by the dataset concluding December 19th. If we view the December 26th out-of-the-box Arima model, we can see that it produces a forecast with a degree of seasonality. However, if we click on the December 19th Arima model, we can see that the forecast is simply the historic average. You may recall from the introductory lessons that a key part of time series analysis is to calculate all the trends in your data. Once you've accounted for each of these trends, your data series is considered to be stationary. The Alteryx Arima model uses normal differencing to de-trend the data. The first level of differencing takes the current value and subtracts the prior value. This is continued for each point, resulting in a first-level differentiated series. If this new series also exhibits a trend, then a second level of differencing needs to be applied. Seasonal differencing is conducted in a similar manner, except this time, it's the difference between the current value and the value in the previous corresponding season that is calculated. Again, if the resulting dataset continues to exhibit a trend, a second level of seasonal differencing needs to be applied.
Let's return to our Alteryx workflow and consider the pink models. That is, the models concluding December 26th. For demonstration purposes, we've customized the second pink model so that the parameters include one level of seasonal differencing. In the third pink model, we specify two levels of seasonal differencing. We've similarly created two additional blue models. That is, the series concluding December 19th. As you might expect, the second blue model has been customized with one level of seasonal differencing, while the third blue model has been customized with two levels of seasonal differencing. This can be seen in the configuration window for the Arima tool in question. If we consider the output from the two additional pink models first, we can see that two separate forecasts have been generated.
The interesting difference comes with the blue models. Whereas the first blue model, the out-of-the-box, not customized version, resulted in a forecast with a flat mean, the two customized models display more typical seasonal forecasts. Which model is best? When it comes to time series forecasts, the best-fitting model is simply the model that explains the greater proportion of the trend, leaving the smallest unexplained noise. However, this way of thinking gives rise to the risk of overfitting. We could tweak our models in such a way that they practically mirror the historic data. This does not mean that they will have any success forecasting future data. This is why we went to the trouble of using the samples tool in our lessons. In doing so, we were able to send a portion of our data through the Arima and ETS tools, deriving various models. We then tested our model against the validation set, i.e., unknown data.
How the model performs against this new data is the real test in determining which model is best.
This concludes our look at the time series tools in Alteryx. Throughout these lessons, we've given you a good base of how to perform time series forecasting, and when it might be appropriate to customize time series tools.