Forecasting in the World of Big Data

Ryan
Ryan Thompson
Department of Econometrics and Business Statistics
Monash University, Australia

The International Data Corporation forecasts that in the next five years, the total data stored on servers worldwide will exceed 175 billion terabytes (Reinsel et al., 2018). The enormous amount of data available nowadays has served as the basis for many significant innovations and triumphs in data analytics. Forecasting, one of the oldest data-analytic disciplines, has witnessed some particularly exciting developments in the face of big data. Rather than problems involving individual or small groups of time series (data indexed by time), many modern forecasting problems present with hundreds or thousands (even millions) of time series, especially in the fields of business and economics. Modern machine learning methods, capable of consuming large amounts of data to separate signal from noise, provide forecasters with a powerful toolkit for tackling these problems. Here, we take a look at several contemporary applications of machine learning in forecasting and discuss how such methods help forecasters to make remarkably accurate predictions.

Traditionally, forecasting has involved statistical methods such as autoregression and exponential smoothing applied to individual time series. These methods learn temporal patterns from the series they are trained to forecast. What may come as a surprise is that even the most sophisticated machine learning methods consistently lose to these classic methods if they are trained only on the forecast target (Makridakis et al., 2018). One reason for their underperformance is that a single series often does not contain enough information to fit an elaborate model. However, in many contemporary problems, there can be an extensive set of time series available to the forecaster. For such problems, it is possible to improve on classic methods, sometimes substantially, by training machine learning models across multiple time series. Both Uber and Amazon fit deep recurrent neural networks on huge time series databases to predict future demand for their products. In particular, Uber forecasts rideshare demand during special events by training neural networks on historic rider and driver data from hundreds of cities (Laptev et al., 2017). Amazon uses neural networks to forecast item-level sales based on the historic demand for thousands of related products (Salinas et al., 2020). These approaches are novel in that they learn across multiple time series, rather than on the forecast target only. The motivation is that information can be shared across time series to better learn complex temporal dynamics, something deep neural networks can facilitate via their ability to extract features automatically.

Similar techniques to those used at Uber and Amazon took out the top positions in the recent M4 forecasting competition. M4, so-called after its organizer Spyros Makridakis, is the fourth iteration in a series of famous competitions that began in the 1980s. Participating teams were required to forecast 100,000 time series spanning business, finance, economics, and demography. Uber data scientist Slawek Smyl produced the winning forecasts by mixing neural networks with exponential smoothing. This approach, described in detail in Smyl (2020), learns global model parameters across the full set of time series and hierarchically incorporates these with local parameters learned on the specific series being forecast. The second-place submission (Montero-Manso et al., 2020) combined forecasts from nine classic methods, including autoregression and exponential smoothing, using weights generated by gradient boosted trees to form an ensemble forecast. The trees were grown on the full set of time series, allowing them to learn the ensemble composition appropriate for the temporal patterns present in the forecast target, a technique they refer to as “metalearning.” The distinguishing characteristic of these two teams is that their forecast methodologies augment battle-tested statistical methods with machine learning. Unsurprisingly, such approaches are not computationally cheap, and require between a week and a month of computational resources. Nevertheless, it would seem to be a fair price—Makridakis et al. (2020) observed that, on average, forecast accuracy improved with computation time across a broad set of M4 teams; see figure below.

Screen Shot 2020-07-10 at 10.49.31 PM

Forecast accuracy, measured by symmetric mean absolute percentage error (sMAPE), expressed as a function of computation time for forecasting methods from the M4 competition. The labels indicate finishing places. This figure is reproduced from Makridakis 2020.

Big data has also permeated economics. In fact, economists advised governments by analyzing hundreds of economic time series as early as the 1930s (Fuleky, 2020). Nowadays, big economic datasets are updated in real-time and are publicly accessible; see FRED-MD and FRED-QD. Forecasters use these databases to train models for predicting policy-informing variables such as gross domestic product (GDP) growth, unemployment, and inflation. An important feature of economic time series is that they can be directly predictive of one another. For example, unemployment this month has the potential to tell us something about future GDP growth. As a result, forecasters often use large sets of economic time series as predictors in machine learning methods. Medeiros et al. (2019) trained a random forest to forecast U.S. inflation using roughly 500 predictor time series and showed that the resulting model delivers state-of-the-art accuracy. Unlike classic methods, random forests are able to learn complex nonlinear relationships between the forecast target and the predictors, which the authors argue explains their excellent performance. In the same vein, Exterkate et al. (2016) showed that kernel ridge regression leads to good forecasts for a variety of important economic variables. Kernel ridge regression combines regularized regression with the famous “kernel trick” to capture nonlinearities in the data. Other regularization tools such as the Lasso also yield high-quality forecasts (Li and Chen, 2014) and have the particularly attractive quality of producing interpretable models that depend only on a fraction of the predictors.

So, what does the future of big data forecasting hold? This question is one that we should have an answer to soon enough, at least in part. The fifth iteration of the Makridakis competitions (M5) is well underway and due to conclude shortly. Hosted on the data science platform Kaggle, it is the largest competition yet, with more than 5000 registered teams and $100,000 in prize money. This round, the challenge is to forecast sales for Walmart at the item, department, product category, and store levels—approximately 40,000 series in total. M5 is thus an important continuation of research into big data forecasting, which we anticipate will lead to new and exciting applications of machine learning.

 

References:

Exterkate, P., Groenen, P. J. F., Heij, C., and van Dijk, D. (2016). Nonlinear forecasting with many predictors using kernel ridge regression. International Journal of Forecasting, 32(3):736–753.

Fuleky, P., editor (2020). Macroeconomic forecasting in the era of big data, volume 52 of Advanced Studies in Theoretical and Applied Econometrics. Springer International Publishing, Cham, Switzerland.

Laptev, N., Yosinski, J., Li, L. E., and Smyl, S. (2017). Time-series extreme event forecasting with neural networks at uber. In ICML 2017 Time Series Workshop.

Li, J. and Chen, W. (2014). Forecasting macroeconomic time series: Lasso-based approaches and their forecast combinations with dynamic factor models. International Journal of Forecasting, 30(4):996–1015.

Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods: Concerns and ways forward. PLOS ONE, 13(3):1–26.

Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020). The m4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74.

Medeiros, M. C., Vasconcelos, G. F. R., Veiga, Á., and Zilberman, E. (2019). Forecasting inflation in a data-rich environment: The benefits of machine learning methods. Journal of Business and Economic Statistics, page In press.

Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. (2020). Fforma: Feature-based forecast model averaging. International Journal of Forecasting, 36(1):86–92.

Reinsel, D., Gantz, J., and Rydning, J. (2018). The digitization of the world. Technical report, International Data Corporation.

Salinas, D., Flunkert, V., Gasthaus, J., and Januschowski, T. (2020). Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191.

Smyl, S. (2020). A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting, 36(1):75–85.