
Abstract The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host‐microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host‐associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host‐related features and amplicon‐derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB‐RRR) and negative binomial co‐sparse factor regression (NB‐FAR). While NB‐RRR encodes the underlying dependency among the microbial abundances as outcomes and the host‐associated features as predictors through a rank‐constrained coefficient matrix, NB‐FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit‐rank components of the coefficient matrix sequentially, effectively delivering interpretable bi‐clusters of taxa and host‐associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block‐wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.
Data Analysis, Microbiota, microbiome, overdispersed count data, Feeding Behavior, United States, Applications of statistics to biology and medical sciences; meta analysis, Gastrointestinal Microbiome, multivariate analysis, Humans, Regression Analysis, American Gut Project ; Microbiome ; Multivariate Analysis ; Overdispersed Count Data ; Reduced Rank Regression ; Sparse Singular Value Decomposition, reduced rank regression, sparse singular value decomposition, Factor Analysis, Statistical, Life Style, Research Articles, American gut project
Data Analysis, Microbiota, microbiome, overdispersed count data, Feeding Behavior, United States, Applications of statistics to biology and medical sciences; meta analysis, Gastrointestinal Microbiome, multivariate analysis, Humans, Regression Analysis, American Gut Project ; Microbiome ; Multivariate Analysis ; Overdispersed Count Data ; Reduced Rank Regression ; Sparse Singular Value Decomposition, reduced rank regression, sparse singular value decomposition, Factor Analysis, Statistical, Life Style, Research Articles, American gut project
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
