
Identifying causal microbial biomarkers in high-dimensional observational data is challenging due to confounding and spurious correlations. We propose CausalBiome, a scalable framework that combines three invariance-based metrics, namely gradient-stability (consistency of per-feature loss gradients across random partitions), permutation-magnitude (mean AUC drop upon feature shuffling), and permutation-stability (variance of that drop) into a unified importance score. CausalBiome requires only a single ensemble model and simple variance calculations, avoiding costly graph estimation or bi-level optimization. On four merged Type-2 Diabetes microbiome cohorts (n = 746, p = 1,991), CausalBiome filtered to 52 prevalent taxa and achieved the highest Spearman correlation (ρ = 0.91) with held-out Area Under the ROC Curve (AUC) and lowest MSE (0.22) compared to permutation importance, LIME, Gini, and univariate rankings. Top candidates such as Collinsella aerofaciens, Faecalibacterium prausnitzii, and Blautia wexlerae align with known mechanistic roles in glucose metabolism and inflammation. CausalBiome thus offers a practical, interpretable tool for robust causal feature discovery in microbiome and other biomedical studies.
Machine Learning, Gastrointestinal Microbiome
Machine Learning, Gastrointestinal Microbiome
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
