If you’re not a statistician, Alteryx’s time series palette is incredibly useful for producing fast forecasting without too much time spent on Stack Exchange. This blog examines four tools in the palette; ETS, ARIMA, TS compare and TS Forecast. Part 1 will cover the theory behind each tool while Part 2 offers a demonstration in Alteryx.
The ARIMA and ETS Tools
The ARIMA and ETS tools both estimate a forecast model to predict target values outside the scope of the data set. ARIMA stands for autoregressive integrated moving average and it’s the most common algorithm for building time series models. You don’t need to understand ARIMA to use this tool in Alteryx, but just in case you’re curious, this video offers a great intro into autoregression.
The ETS model uses exponential smoothing to estimate a forecast model by taking a weighted mean of past values, whereby the more recent values are weighted more highly than earlier values. You also don’t need to understand ETS to use it in Alteryx, but this answer on Quora gives a fantastic description.
TS Compare Tool
The TS Compare tool takes two inputs; one or more unioned time series model objects, and a section of the data set that was withheld from both models. This ‘hold out’ data is used to compare the error of each model object. The TS compare tool is handy for knowing which model has the lowest error. Once this is known, you can proceed with your actual TS forecast.
The TS Forecast Tool
The TS Forecast tool gives forecasts using a model created with either the ARIMA or ETS tools. It outputs the predicted values for a specified period as well as the upper and lower confidence intervals. It’s important to note that the desired time period for your actual forecast should be the same as that used for the model comparison step. If you want to predict further into the future than was compared by the TS comparison tool, it’s worth re-running each model with the updated configuration and comparing them once more.
What does good forecasting data look like?
For time series forecasting, or any predictive modelling, it’s ideal to have complete data. If there are nulls in your target field this is going to lead to a much less effective model. Another requirement is that your target fields are spaced equally apart along the time axes. That is, a data set with non-null target values corresponding with time points that are spaced one month apart, then two months, then back to one month apart etc. will not provide a good time series model.
Ideally you would have something like monthly sales, recorded for exactly every month, over the span of several years. Having multiple years is especially useful for predicting seasonality, but this also depends on the nature of the data.
Part 2 of this blog will cover how to configure these tools in Alteryx to produce a useful time series forecast.