For the last couple of articles, we are working on a manufacturing case study to forecast tractor sales for a company called PowerHorse. You can find the previous articles on the links Part 1 and Part 2. In this part, we will start with ARIMA modeling for forecasting. ARIMA is an abbreviation for Auto-Regressive Integrated Moving Average. However, before we learn more about ARIMA let’s create a link between…
ARIMA and Sugar Cane Juice
May and June are the peak summer months in India. Indian summers are extremely hot and draining. Summers are followed by monsoon rains. It’s no wonder that during summers everyone in India has the habit of looking up towards the sky in the hope to see clouds as an indicator of the arrival of monsoons. While waiting for the monsoons, Indians have a few drinks that keep them hydrated. Sugar cane juice, or ganne-ka-ras, is by far my favorite drink to beat the heat. The process of making sugar cane juice is fascinating and has similarities with ARIMA modeling.
Sugar cane juice is prepared by crushing a long piece of sugar cane through the juicer with two large cylindrical rollers as shown in the adjacent picture. However, it is difficult to extract all the juice from a tough sugar cane in one go hence the process is repeated multiple times. In the first go, a fresh sugar cane is passed through the juicer and then the residual of the sugar cane that still contains juice is again passed through the juicer many times till there is no more juice left in the residual. This is precisely how ARIMA models work. Consider your time series data as a sugar cane, and ARIMA models as sugar cane juicers. The idea with ARIMA models is that the final residual should look like white noise otherwise, there is juice or information available in the data to extract.
We will come back to white noise (juice-less residual) in the latter sections of this article. However, before that let’s explore more about ARIMA modeling.
ARIMA is a combination of 3 parts i.e. AR (AutoRegressive), I (Integrated), and MA (Moving Average). A convenient notation for ARIMA model is ARIMA(p,d,q). Here p,d, and q are the levels for each of the AR, I, and MA parts. Each of these three parts is an effort to make the final residuals display a white noise pattern (or no pattern at all). In each step of ARIMA modeling, time series data is passed through these 3 parts like a sugar cane through a sugar cane juicer to produce juice-less residual. The sequence of three passes for ARIMA analysis is as following:
1st Pass of ARIMA to Extract Juice / Information
Integrated (I) – subtract time series with its lagged series to extract trends from the data
In this pass of ARIMA juicer, we extract trend(s) from the original time series data. Differencing is one of the most commonly used mechanisms for extraction of trends. Here, the original series is subtracted with it’s lagged series e.g. November’s sales values are subtracted with October’s values to produce trend-less residual series. The formulae for different orders of differencing are as follow:
|No Differencing (d=0)|
|1st Differencing (d=1)|
|2nd Differencing (d=2)|
For example, in the adjacent plot a time series data with a linearly upward trend is displayed. Just below this plot is the 1st order differenced plot for the same data. As you can notice after 1st order differencing, trend part of the series is extracted and the difference data (residual) does not display any trend.
The residual data of most time series usually become trend-less after the first order differencing which is represented as ARIMA(0,1,0). Notice, AR (p), and MA (q) values in this notation are 0 and the integrated (I) value has order one. If the residual series still has a trend it is further differenced and is called 2nd order differencing. This trend-less series is called stationary on mean series i.e. mean or average value for series does not change over time. We will come back to stationarity and discuss it in detail when we will create an ARIMA model for our tractor sales data in the next article.
2nd Pass of ARIMA to Extract Juice / Information
AutoRegressive (AR) – extract the influence of the previous periods’ values on the current period
After the time series data is made stationary through the integrated (I) pass, the AR part of the ARIMA juicer gets activated. As the name auto-regression suggests, here we try to extract the influence of the values of previous periods on the current period e.g. the influence of the September and October’s sales value on the November’s sales. This is done through developing a regression model with the time lagged period values as independent or predictor variables. The general form of the equation for this regression model is shown below. You may want to read the following articles on regression modeling Article 1 and Article 2.
AR model of order 1 i.e. p=1 or ARIMA(1,0,0) is represented by the following regression equation
3rd Pass of ARIMA to Extract Juice / Information
Moving Average (MA) – extract the influence of the previous period’s error terms on the current period’s error
Finally, the last component of ARIMA juicer i.e. MA involves finding relationships between the previous periods’ error terms on the current period’s error term. Keep in mind, this moving average (MA) has nothing to do with moving average we learned about in the previous article on time series decomposition. Moving Average (MA) part of ARIMA is developed with the following simple multiple linear regression values with the lagged error values as independent or predictor variables.
MA model of order 1 i.e. q=1 or ARIMA(0,0,1) is represented by the following regression equation
White Noise & ARIMA
Oh, how I miss the good old days when television was not on 24×7. For the good part of the day the TV used to look like the one shown in the picture – no signals just plain white noise. As a kid, it was a good pass time for my friends and me to keep looking at the TV with no signal to find patterns. White noise is a funny thing, if you look at it for long you will start seeing some false patterns. This is because the human brain is wired to find patterns, and at times confuses noises with signals. The biggest proof of this is how people lose money every day on the stock market. This is precisely the reason why we need a mathematical or logical process to distinguish between a white noise and a signal (juice / information). For example, consider the following simulated white noise:
If you stare at the above graph for a reasonably long time you may start seeing some false patterns. A good way to distinguish between signal and noise is ACF (AutoCorrelation Function). This is developed by finding the correlation between a series of its lagged values. In the following ACF plot, you could see that for lag = 0 the ACF plot has the perfect correlation i.e. ρ=1. This makes sense because any data with itself will always have the perfect correlation. However as expected, our white noise doesn’t have a significant correlation with its historic values (lag≥1). The dotted horizontal lines in the plot show the threshold for the insignificant region i.e. for a significant correlation the vertical bars should fall outside the horizontal dotted lines.
There is another measure Partial AutoCorrelation Function (PACF) that plays a crucial role in ARIMA modeling. We will discuss this in the next article when we will return to our manufacturing case study example.
In this article, you have spent your time learning things you will use in the next article while playing your role as a data science consultant to PowerHorse to forecast their tractor sales.
In the meantime, let me quickly check out of my window to see if there are any clouds out there………. Nope! I think there is still time before we will get our first monsoon showers in Bombay for this year – need to keep my glass of sugar cane juice handy to fight this summer.