Sign in or start a free trial to avail of this feature.
2. Forecast Methodologies in Alteryx
In this lesson, we'll look at the different forecasting methodologies available in Alteryx.
- Stands for Autoregressive Integrated Moving Average
- A forecasting technique that uses a form of regression analysis that compares time series data at lagged intervals with the moving average of that data to predict future movements
- ARIMA models are sensitive to seasonality
- Stands for Exponential Smoothing
- Technique of making forecasts based on a weighted average of past values, with more recent values given higher weights
For a more detailed discussion on this topic, please follow this link
In the previous lesson we briefly discussed the concept of timeseries forecasting. In this lesson we'll look at the forecasting methodologies available in Alteryx. Alteryx offers two prepackaged forecasting methodologies: ARIMA and ETS.
ARIMA stands for Autoregressive Integrated Moving Average. This technique uses a form of regression analysis that compares timeseries date at lagged intervals with a moving average of that data to predict future movements. The second methodology, ETS, stands for Exponential Smoothing. Exponential Smoothing is a technique to make forecasts by using a weighted average of past values with more recent values given higher weights. The difference between ARIMA and ETS models is an involved topic, but the key factor concerns how they weight historic data and seasonality. A timeseries with seasonality is non-stationary while a timeseries that does not depend on the time at which a series was observed is stationary. ARIMA models can be both stationary and non-stationary, but all ETS models are non-stationary, so which model should you use? The short answer is that it's not easy to say. In cases where recent events have a big influence on what happens next, like financial data, the ETS model may be superior. On the other hand, if your timeseries is not seasonal, the ARIMA model may be preferred. In truth, making assumptions regarding the preferred model can lead to over-fitting and is discouraged. It's far better to approach your data with an open mind, try both models using a sample set, and then refine your choice using a validation set. Thankfully, Alteryx makes this process straightforward. Let's look at an example to illustrate this point. If we again consider our online sales data, we have three years of hourly data presented in a bar chart of each data point. If we conduct a forecast of this hourly data, we need to be cognizant of the level of granularity. For example, it might be appropriate to use hourly data to forecast forward 48 periods, or 48 hours, as shown here in the bottom chart. However, it's not good practice to use hourly data to forecast, say, three months forward. This would represent more than 1500 periods and is an unrealistic projection. Instead, we could aggregate our timeseries to a larger time period, thereby reducing the number of periods being forecast. If we go to the weekly level, the three-month forecast contains only 12 weeks, a reasonable ask. As a rule of thumb, you should be looking for historic data at least three to four times longer than your desired forecast period. In other words, nine months to a year of historical data for a three-month forecast. Naturally, this will depend on how stable your data is, as well as the effect of longer-term cycles. For example, annual sales may be influenced within the year according to seasonal factors like Christmas holidays. However, there are also longer-term cycles to take into consideration. As weekly sales across the previous nine to 12 months will most likely be affected by seasonality, an ETS analysis might make more sense here. If we're looking at sales figures that don't fluctuate due to the time of year, an ARIMA analysis may be preferable. A degree of judgment and common sense is called for. This concludes our introduction to timeseries analysis. Over the following lessons we'll consider how to perform ARIMA and ETS timeseries analyses. We'll compare both models to determine the most appropriate for a specific dataset and desired business case.