
Reliable estimates of excess mortality (EM) are essential for both actuarial applications and informing public debate and policy. Yet, published estimates during the COVID-19 pandemic for Germany varied by tens of thousands of deaths. This discrepancy raises the question of how sensitive EM estimates are to methodological choices. We therefore constructed a comprehensive factorial framework of models to calculate expected mortality, whilesystematically varying six components: sex treatment, demographic dataset, age cohort resolution, temporal resolution, forerun length, and mortality model form. The full design yields 1152 candidate models, of which 764 remained feasible after excluding unstable combinations. These models were fitted to German mortality data 2000--2019 and extrapolated to 2020--2024. We then assessed model performance using residuals and the fraction of variance unexplained (FVU). Extrapolations were assessed by analyzing predicted EM. Our results demonstrate that EM estimates for the German population are surprisingly robust to choices of sex treatment and temporal resolution, but highly sensitive to age cohort resolution, forerun length, and model form. In particular, models without age stratification produce implausibly high EM due to Simpson’s paradox, and constant or quadratic models yield unreliably diverging extrapolations. By contrast, actuarial models with moderate forerun lengths provide robust and thus interpretable results. Based on these findings, we strongly recommend excluding constant and quadratic baselines, avoiding unstratified models, and using updated demographic data. Among the feasible candidates, only 208 models are recommended for estimating EM of a stratified population -- mainly those with actuarial form, demographic resolution, and moderate to long forerun lengths. By highlighting the methodological sensitivity of estimates of expected mortality, this study aims at guiding actuaries, statisticians, and public-health modelers to construct meaningful, reliable baselines for EM estimates.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
