Rethinking Statistics and Causality: Why Mechanisms Cannot Be Inferred from Data Distributions

Name: Rethinking Statistics and Causality: Why Mechanisms Cannot Be Inferred from Data Distributions
Creator: Diau, Egil
Keywords: Machine Learning, Causal Inference, Deep Learning, Artificial Intelligence, Statistics, FOS: Mathematics, Statistics and probability, Bayesian statistics, Data science

Diau, Egil

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint . 2025

License: CC BY

Data sources: Datacite

ZENODO

Preprint . 2025

License: CC BY

Data sources: Datacite

Rethinking Statistics and Causality: Why Mechanisms Cannot Be Inferred from Data Distributions

descriptionPublicationkeyboard_double_arrow_right Preprint 17 Nov 2025Publisher:Zenodo

Authors: Diau, Egil;

doi: 10.5281/zenodo.17633314 , 10.5281/zenodo.17633313

Rethinking Statistics and Causality: Why Mechanisms Cannot Be Inferred from Data Distributions

- Summary
- Subjects
- Metrics

Abstract

Statistical and causal inference have become universal currencies of explanation across the sciences, particularly in domains where underlying mechanisms remain opaque. Their apparent rigor—spanning psychology, economics and biomedicine—rests on the assumption that patterns within data can reveal the processes that generate them. Yet persistent mismatches between empirical predictions and real-world behaviour expose a deeper limitation: mechanisms cannot be inferred from data distributions alone. To address this limitation, we revisit the foundations of both paradigms, showing how statistical inference reduces explanation to geometric alignment, while causal inference, evolved from Bayes’ theorem and graphical models, extends this misstep by conflating probabilistic structure with causal truth. Both expose the same epistemic gap: data encode a lower-dimensional projection of structure, not the mechanism that generates it. We argue that understanding the world follows two routes: one is data-driven, expanding models toward richer function classes to achieve high-precision prediction, as exemplified by modern deep learning; the other is mechanism-driven, proposing and testing structural hypotheses as in the physical sciences. A robust framework requires both: data-driven models for high-precision prediction, and mechanistic models for reconstructing how the world produces the data we observe.

Keywords

Machine Learning, Causal Inference, Deep Learning, Artificial Intelligence, Statistics, FOS: Mathematics, Statistics and probability, Bayesian statistics, Data science

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Knowmad Institut

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now