
We present a comprehensive empirical study examining the robustness of Latent Posterior Factor (LPF) models under varying degrees of data corruption in tax compliance classification tasks. Our experiments systematically evaluate model performance across five noise configurations ranging from clean data to extreme corruption (70% feature noise, 40% contradictory evidence). Results demonstrate that LPF models with Sum-Product Network (SPN) aggregation provide interpretable uncertainty quantification through predictable degradation curves, though they achieve lower absolute accuracy than BERT baselines and alternative architectures across all noise levels—a gap of 1–8 percentage points depending on noise severity. Through extensive seed testing (15 seeds per configuration) and multi-metric evaluation including Expected Calibration Error (ECE), Negative Log-Likelihood (NLL), and Brier scores, we establish that probabilistic evidence aggregation provides measurable robustness advantages in noisy environments. Our analysis reveals critical noise tolerance thresholds and quantifies the contribution of different architectural components to overall model resilience. Keywords:Model robustness, data corruption, Latent Posterior Factors (LPF), tax compliance classification, uncertainty quantification, probabilistic aggregation, sum-product networks (SPN), expected calibration error (ECE), negative log-likelihood (NLL), Brier score, interpretable AI, noise tolerance, neural-symbolic reasoning, multi-metric evaluation, machine learning resilience, evidence-based AI
Brier score, uncertainty quantification, tax compliance classification, multi-metric evaluation, machine learning resilience, interpretable AI, negative log-likelihood (NLL), neural-symbolic reasoning, data corruption, sum-product networks (SPN), Latent Posterior Factors (LPF), evidence-based AI, noise tolerance, expected calibration error (ECE), probabilistic aggregation, Model robustness
Brier score, uncertainty quantification, tax compliance classification, multi-metric evaluation, machine learning resilience, interpretable AI, negative log-likelihood (NLL), neural-symbolic reasoning, data corruption, sum-product networks (SPN), Latent Posterior Factors (LPF), evidence-based AI, noise tolerance, expected calibration error (ECE), probabilistic aggregation, Model robustness
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
