So far we have covered the following topics in this case study example on time series forecasting and ARIMA models:
Part 4: ARIMA model case study example
These topics focused on forecasts using embedded information in the data series. These forecasts are often regarded as good-to-know information and are quite useful for planning purposes. However, they don’t empower organizations with information to change the course of the future.Organizations usually like to know if they could do things better for improved outcomes. They also like to know if their current efforts are generating desired outcomes, and want suggestions for course corrections. These are part of fundamental questions for any organization and require thorough analysis and creativity.
Keeping with this theme, in this article, we will continue with our case study example and try to answer the question whether the PowerHorse tractors marketing effort is generating added sales revenue. For this, we will use regression with ARIMA errors (ARIMAX) or exogenous variable ARIMA. Before that let’s learn about a useful concept for model selection i.e. Akaike Information Criterion (AIC) through:
Michael Jackson, the king of pop music, is regarded as one of the most famous and truly international artists till date. His net earning from the sales of his music albums had surpassed his peers by a huge margin. Despite this when he died in 2009 he had close to $400 million in debts. Clearly, he could not manage his finances. Elves Presley, another legendary musician, met the same fate when he died. Just before their deaths both these artists were working relentlessly for their stage performances to fight their financial troubles. Clearly, healthy finance is not just about earning well but it is also about managing costs. It’s a fine balancing act between these two parameters as depicted by the following simple equation:
Debt = Cost – Income
For healthy finances, the idea is to keep the debt at the minimum. Model development in data science, like any other endeavor in life, is also about finding the right balance. We will explore the same in the next section while learning about…
Akaike Information Criterion (AIC)
Akaike Information Criterion (AIC) is a mechanism to select the best fit model. AIC has similarities with the debt formula we have seen in the previous section. AIC is an effort to balance the model between goodness-of-fit and number of parameters used in the model. This is similar to the balancing act between income and cost. As a modeler, you care about the maximum goodness of fit (income) with the minimum number of parameters (cost). The formula for AIC is displayed below:
For the given model, L in the above formula is the maximized value of the likelihood function representing goodness-of-fit, and k the number of estimated parameters. Like your debts, you want to keep AIC value at the minimum to choose the best possible model. Bayesian Information Criterion (BIC) is another variant of AIC and is used for the same purpose of best fit model selection. For the best possible model selection, you want to look at AIC, BIC, and AICc (AIC with sample correction) if all these values are minimum for a given model
Regression with ARIMA Errors – Case Study Example
For the last 4 years, PowerHorse tractors is running an expensive marketing and farmer connect program to boost their sales. They are interested in learning the impact of this program on overall sales. As a data science consultant you are helping them with this effort. This is an interesting problem and requires a thorough analysis followed by creative solutions and scientific monitoring mechanism. To begin with you will build models based on regression with ARIMA errors and compare them with the pure play ARIMA model. This analysis will provide some clues towards effectiveness of the marketing program. However, this analysis will not be conclusive for finding shortcomings and enhancements for the program which will require further analysis and creative solutions. The later analysis will become another case study example on YOU CANalytics. The following is the approach you have taken for regression with ARIMA errors. If you want a quick brush up on regression before you jump to regression with ARIMA errors read the following articles
You can find the data for this analysis attached at the end of this article. To begin with, you plot the following scatter plot of same months marketing expense and tractors sales.
This looks promising with quite a high correlation coefficient (ρ > 0.8). However, there is a lurking danger in analyzing non-stationary time series data. Since two uncorrelated series can display high correlation because of time series trend in data. In this case, PowerHorse is a growing company hence both its sales and marketing expenses are on an upward curve. A better way is to find the correlation between stationary data obtained through differencing. The following is the correlation plot for stationary data:
Ok, so that near perfect correlation has now disappeared though there is still some correlation in this data (ρ = 0.41). Typically, the marketing effort for the previous few months needs to have a good correlation with sales for an effective marketing program. The marketing expense for the last month as displayed below has very little correlation (ρ = 0.17):
The correlation for the previous quarter also shows non-existent correlation with sales. Now, let’s build a regression model with ARIMA error (ARIMAX) model for the current and previous months.
|Tips to Build Regression with ARIMA Error Models in R|
|In R you could use auto.arima function with xreg to build such models e.g. auto.arima(Tractor_Sales_Data, xreg=Marketing_Expense). Auto.arima function is part of forecast package used in the previous article ARIMA model.|
The following are the results for sales forecast for tractors in a month using marketing expense in regression with ARIMA errors model. Focus on the last part of the results with AIC values.
|Time series:||Tractor Sales with Marketing Expense|
|Best fit Model: ARIMA(0,0,0)|
Next, let’s build a pure play forecast without marketing expense as a predictor variable.
|Time series:||Tractor Sales without Marketing Expense|
|Best fit Model: ARIMA(1,0,0)(0,1,0)|
Notice AIC, AICc, and BIC values for the plain ARIMA model without marketing expense as predictor variable has lower values of the two models. This indicates that marketing expense is not actually adding value to tractor sales. This is the first indication for the management at PowerHorse to re-evaluate the marketing and farmer connect program. I must point out that evaluation of marketing budgets with a forecasting model like the one we have built is not the best of practices. The best practice is to embed scientific data collection, monitoring, and evaluation mechanism in the design of a marketing program at inception. However, a scientific and well thought out mechanism prior to implementation is often missing in many programs. This is when one could go back in time to use regression with ARIMA error to evaluate effective of marketing programs.
Balance is the key to life be it the predator-prey relationship (ecosystem) or planetary motion. The economy also operates to achieve a balance between supply and demand. The perfect balance between our blood pressure and the atmospheric pressure is what saves us from being crushed into pulp or bursting into pieces. Balance is an integral part of a happy life: be it a balance between work and relaxation or personal and social time. May we all achieve this balance to live a happy life.
|Data: Sales and Marketing|
R Code: R Code – Exogenous ARIMA model