
arXiv: 1512.09244
In public discussions of the quality of forecasts, attention typically focuses on the predictive performance in cases of extreme events. However, the restriction of conventional forecast evaluation methods to subsets of extreme observations has unexpected and undesired effects, and is bound to discredit skillful forecasts when the signal-to-noise ratio in the data generating process is low. Conditioning on outcomes is incompatible with the theoretical assumptions of established forecast evaluation methods, thereby confronting forecasters with what we refer to as the forecaster's dilemma. For probabilistic forecasts, proper weighted scoring rules have been proposed as decision theoretically justifiable alternatives for forecast evaluation with an emphasis on extreme events. Using theoretical arguments, simulation experiments, and a real data study on probabilistic forecasts of U.S. inflation and gross domestic product growth, we illustrate and discuss the forecaster's dilemma along with potential remedies.
ddc:510, FOS: Computer and information sciences, proper weighted scoring rule, Diebold-Mariano test, predictive performance, likelihood ratio test, Neyman-Pearson lemma, probabilistic forecast, 510, Inference from stochastic processes and prediction, rare and extreme events, Methodology (stat.ME), Neyman–Pearson lemma, hindsight bias, Diebold–Mariano test, Applications of statistics to economics, Mathematics, info:eu-repo/classification/ddc/510, Statistics - Methodology
ddc:510, FOS: Computer and information sciences, proper weighted scoring rule, Diebold-Mariano test, predictive performance, likelihood ratio test, Neyman-Pearson lemma, probabilistic forecast, 510, Inference from stochastic processes and prediction, rare and extreme events, Methodology (stat.ME), Neyman–Pearson lemma, hindsight bias, Diebold–Mariano test, Applications of statistics to economics, Mathematics, info:eu-repo/classification/ddc/510, Statistics - Methodology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 98 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
