With modern reporting tools allowing more and more companies to answer 'what did we do yesterday?' the logical progression is to then predict, 'what are we going to do tomorrow?' Forecasting solutions vary in complexity with options including elaborate third party software, complex spreadsheets developed by internal analysts, and occasionally a shake of the good-ol' crystal eight ball. I recently had a conversation with a consultant who confided with me that their client swears by XYZ solution. "How does it work?" I asked. Does it look at a bell curves and confidence intervals? 12 month moving average? 3 month moving average? Weighted average? Linear regression? Seasonal adjustment? "It looks at all of that" I was assured. "How does the system know which forecast method to use?" ... silence.
To be honest, almost any method could be a completely (in)appropriate way of forecasting the future! If you sell ice cream or agricultural pesticides, there's likely to be seasonality in your data. If you're predicting the stock market, moving averages and linear regression are common prediction methods. If you're tracking click rates after launching a new marketing campaign ... well that's another level of complicated, but to get the ball rolling, linear or exponential regression models may be a solid place to start.
Calculating Error in a Forecast
In a nutshell, cost functions evaluate a forecast method's goodness of fit by measuring the amount of error where:
Absolute Error = Absolute Value (Expected - Actual).
In the sample data set I whipped up, we have several months of Actuals and two forecast methods, 3 Month Moving Average and 5 Month Moving Average (3MA and 5MA respectively). An n-month moving average forecasts the current month by taking the average of n preceding months.
In a future blog post, I'll share how I built these forecast models into my SSAS Sales cube using MDX.
For the month of December, 3MA forecast 625 while 5MA predicted 596. Given that actuals were 846, 3MA was clearly a better forecast because it had the least absolute error.
Google tells me that cost functions "provide a criterion by which forecast methods can be compared," but I've chosen to reinterpret the definition as, "cost functions help us decide which forecasting method to use."
To come clear, I am not a statistician. While developing a forecasting data model for a client, I was able to consult a trusted mathematician who guided me through the basics of using cost functions to evaluate and select the 'best' forecasting model. I'm writing this article with the assumption that you either have forecast models that you're currently using, or are interested in adopting budgeting tools for use.
Calculating the Cost Functions MAD, MAPE, and MSEDon't go running for the hills just yet! Cost functions and abbreviations can sound intimidating, but this 5th grade level math will enable us to quantify and compare the error between two forecasts. The three most common cost functions are MAD (Mean Absolute Deviation), MAPE (Mean Absolute Percent Error) and MSE (Mean Squared Error).
Calculating MAD (Mean Absolute Deviation)
MAD is the sum of absolute error (also called deviations) divided by the number of periods measured.
In December and November the absolute errors for 3MA were 221 and 92 as opposed to 5MA's 250 and 187. We concluded that 3MA on average had less error.
Calculating MAPE (Mean Average Percent Error)
MAPE differs from MAD by calculating the average absolute percent error instead of simply taking the average absolute error.
Absolute Percent Error: ABS(Expected - Actual) / Actual
Calculating MSE (Mean Squared Error)
Mean Squared Error takes the Average of Error Squared.
As aforementioned, MAPE is the preferred cost function because it minimizes overall error, but consider the challenge of equipping production lines across factories in different countries. Once resources have been allocated, it may be difficult to reapportion or procure additional machinery; therefore, in this instance our primary concern is not minimizing error across the entire data set so much as minimizing error at each forecast point. In other words, the ideal forecast will allow many smaller errors while avoiding a few significant errors.
In this new data set we have two forecast methods, 3MA and Growth Factor (wherein using a crystal ball we 'managed' to predict expected as actuals * 1.28). Both forecast methods have the same total deviation and MAD but 3MA, with a max percent error of 70%, has larger swings in absolute deviation while Growth Factor consistently has a 28% error for each forecast.
In terms of our factory example, using MSE we would have selected the forecast that equally over allocated equipment at a +28% error across all factories instead of the 3MA model which would have wide swings in equipment over and under allocation.
Which Cost Method Should I Choose?
Someone said "there's lies, damned lies and statistics", and it's currently uncertain whether to credit Mark Twain or a British dignitary. Go figure. As with all things business related when faced with the question, "Which cost function is best?" the answer is, "It depends." Although MAPE and MAD weigh error evenly MAPE provides context for the severity of error where MAD does not, thus making it the de facto choice in business analysis. However, because MAPE represents percent error and 5th grade math tells us we can't divide by zero, you may have to use MAD or MSE in cases where your data set's actuals could include 0's--consider monthly inventory demand for example. Lastly, in instances where we're more concerned with reducing error per forecast, it may be more important to use the MSE cost factor.
One last thought to keep in mind before embarking on the world of forecast evaluation: not all errors were made equal! With MAD, MSE, and MAPE, we're assuming positive error and negative error are equally weighted, but consider a retail environment where running out of inventory may be more costly than carrying inventory. While warehouse fees or product obsolescence from overstocking inventory may add up, overall it may be less costly than a lost sale or disappointed customer.
Visualization: Closing the Loop
Take a look at this ball of spaghetti! In a world where our reporting and analytics tool can graph historical demand and quickly generate multiple forecasting models, this mess is just a few clicks away, and though loaded with an overabundance of forecasting data, the chart is too busy and doesn't actually tell us which forecast method to use. Which option has the lowest MAPE?
The above chart is halfway there insofar as it could be filtered to show us todays' period and identify the forecast with the lowest MAPE (or MSE as appropriate). I can then simplify my chart to the bare essentials: actuals, expected, a trend line, and the prediction for the next period.
That's All Folks
With the exception of the simplified introductory datasets at the top of the blog, all the charts were generated based on MDX added to the stock cubes provided with Jet Enterprise. If you're interested in adding forecasting or forecast evaluation to your cube-based reporting environment give me a holler at Jae@OnyxReporting.com