
This repository contains original unprocessed data and code for the replication of our paper Identifying Multi-omics Signatures that characterize Responders to Plant-based Dietary Interventions from a Randomized Trial. It contains the following files: MultiOmics_Responders.Rproj R XGboost_functions.R - functions for training XGboost models globals_learners.R splsda_functions.R - functions for fitting single omics sPLSDA models utils.R utils_h1_build.R - auxiliary functions for baseline data preprocessing utils_h2_build.R - auxiliary functions for change data preprocessing data diets_info processed - preprocessed datasets raw_combined - unprocessed datasets output Rdata - output from models figures tables renv activate.R library settings.json staging renv.lock scripts baseline_analysis - scripts to perform DIABLO, sPLSDA and XGboost models with baseline data. changes_analysis - scripts to perform DIABLO, and sPLSDA models with changes data. preprocessing - scripts for executing data preprocessing Usage: Open the project in RStudio by double-clicking the file MultiOmics_Responders.Rproj in the root directory. Once in RStudio, run the following commands in the console to activate the R environment (you only need to do this once when setting up the project for the first time): renv::activate() renv::restore() To generate the preprocessed datasets used throughout the analyses and manuscript, please run: scripts/preprocessing/build_h1_datasets.R for generating baseline data scripts/preprocessing/build_h2_datasets.R for generating changes data. For reproducing results for the XGboost prediction models using baseline data, please run: scripts/baseline_analysis/xgboost/XGboost_models.R scripts/baseline_analysis/xgboost/res.R for creating tables S4-1 and S4-2. Note: XGboost prediction model for all diets combined using multiomics data was executed using the Marvin HPC cluster. You can find the script under scripts/baseline_analysis/xgboost/XGboost_models_all_diets_multiomics.R. We raise awareness of the limitations of standard hardware to replicate these results. For reproducing results for sPLSDA using baseline data, please run: scripts/baseline_analysis/spls-da/all_diets_single_omics.R scripts/baseline_analysis/spls-da/nd_single_omics.R scripts/baseline_analysis/spls-da/vd_single_omics.R scripts/baseline_analysis/spls-da/res.R for creating tables S1_3, S1_7, and Figures Fig1_D, Fig2_D, and Fig2_H. For reproducing results for sPLSDA using change data, please run: scripts/changes_analysis/spls-da/all_diets_single_omics.R scripts/changes_analysis/spls-da/nd_single_omics.R scripts/changes_analysis/spls-da/vd_single_omics.R scripts/change_analysis/spls-da/res.R for creating tables S1_4, S1_9, and Figures Fig3_E, Fig4_E, and Fig5_E. DIABLO models (design and distance loop) are located in: scripts/baseline_analysis/DIABLO/ scripts/changes_analysis/DIABLO/ Note: All DIABLO models were run on the Marvin and IMBIE/IGSB HPC clusters and submitted through the SLURM scheduler, with an average runtime of approximately 120 hours per job. We therefore note that reproducing these analyses on standard hardware may not be feasible due to computational requirements. To reproduce the main results and figures from pre-computed DIABLO models, please run: scripts/baseline_analyisis/DIABLO/res.R for creating tables S1_1, S1_6, and S7_1. scripts/changes_analyisis/DIABLO/res.R for creating tables S1_2, S1_8, S3_1, S7_2, and Figures Fig1_A, Fig4_A, and Fig5_A. To reproduce performance metrics (AUC, BER) plots from LOOCV evaluation: Run scripts/baseline_analyisis/res.R for creating Fig1_C, and Fig2_C. Run scripts/changes_analyisis/res.R for creating Fig3_D, Fig4_D, and Fig5_D. Further notes: All res.R scripts that produce tables and figures can be run independently of the model-fitting scripts, as they rely on already computed outputs stored in the output/ folder. For replication of the results, please use R version 4.3.2, unless specified otherwise.
