What is Autocorrelation?
Simply put autocorrelation is a pattern that repeats itself over a time period. Or the degree of similarity of a given time series and the lagged version of itself over a specified time interval. A common example of this is temperature, in most countries (northern hemisphere) you would expect every 12 months that in the winter, temperatures would drop, and in the summer go up. If the pattern doesn’t repeat one year this could be explained by an external variable acting on it, like CO2 emissions causing a greenhouse effect, warming temperatures. However, this may be misleading as climate change isn’t just the warming of temperature in all seasons, it is that the extremities of each season may be amplified, so hotter summers and colder winters. In which case the pattern would be repeated, but with just greater magnitudes for the values. So, it is important to recognise that there may be a pattern to your data with a lag, but perhaps the reason why it could deviate is not too obvious.
In Alteryx one can use the TS plot tools, which are themselves macros made using R. In the interactive output, one can see the graph of an autocorrelative function over several lag periods of weeks, months, days, whatever is specified. An absolute value (-1 or +1) of 1 of autocorrelation indicates a perfect correlation (a pattern).
In Alteryx the dotted line in the graphs of autocorrelation represents the level of significance, if a value goes above or below this line then the result is statistically significant, e.g. the lagged results against itself is a valid model for prediction on whether it will repeat. One can also determine for how long a particular pattern will persist, for example, if a lag was significant up to a long period of time, say 24 lags, then one could say that if the graph exhibits a trend then it will continue in that way for some time.
Alteryx shows a partial autocorrelative function (PACF) graph too. However, in order to understand this, we need to define a few things, direct effect and indirect effect of autocorrelation. Indirect is the price from 5 months ago and the price of each subsequent month ago affecting the price today. Direct is the price 5 months ago affecting prices directly today without interaction with other terms. An example is a festival that only occurs every 5 months. PACF considers only the direct effect, whereas ACF includes both the effect of t-1 periods ago on today, but also the effect of t-2 periods ago on t-1 periods ago.
Below is an example of the indirect effect. Where x is an undetermined coefficient, p is the price, and the subscript t being time periods.
Here each term acts on each term after it
In an example, if only the first, fourth, eighth and thirteenth lags were significant, like below then a good model could be:
Here only the first period and the fifth period ago affect the price today and have no bearing on each other
Is autocorrelation a good thing or a bad thing? Truth be told, it can vary depending on what you want to try to measure. The Arima model can correct for autocorrelation, if the errors are correlated then a model for predicting weather in one state or local government area can also be wrong in another too. ETS modelling can help correct for the correlated error terms as ETS models: trend and seasonality. On the other hand, Arima plots are based explicitly on the past explaining the present.
AIC and BIC
Whilst I am writing about the results in the time series tool tab, I should explain BIC and AIC. It is the Akaike and Bayesian information criterion (AIC and BIC). These values represent a value at which a model should be chosen or not. The lower the score the better the model is, so if you aren’t sure about which model to use because both give similar results or because the difference is not obvious AIC and BIC may be able to help decide which model to use. AIC is used to select a model that most adequately describes and unknown, multi-dimensional state. BIC is used to try to find the true model amongst a set of candidates.