
Abstract High-throughput sequencing data lie at the heart of modern microbiome research. Effective analysis of these data requires careful preprocessing, modeling, and interpretation to detect subtle signals and avoid spurious associations. In this review, we discuss how simulation can serve as a sandbox to test candidate approaches, creating a setting that mimics real data while providing ground truth. This is particularly valuable for power analysis, methods benchmarking, and reliability analysis. We explain the probability, multivariate analysis, and regression concepts behind modern simulators and how different implementations make trade-offs between generality, faithfulness, and controllability. Recognizing that all simulators only approximate reality, we review methods to evaluate how accurately they reflect key properties. We also present case studies demonstrating the value of simulation in differential abundance testing, dimensionality reduction, network analysis, and data integration. Code for these examples is available in an online tutorial (https://go.wisc.edu/8994yz) that can be easily adapted to new problem settings.
Data Analysis, Bioinformatics, Microbiota, microbiome, High-Throughput Nucleotide Sequencing, Computational Biology, Computation Theory and Mathematics, power analysis, Review, Biological Sciences, simulation, Bioinformatics and computational biology, methods assessment, Biochemistry and cell biology, Genetics, Humans, methods selection, Computer Simulation, Microbiome, Biochemistry and Cell Biology, Other Information and Computing Sciences, Software
Data Analysis, Bioinformatics, Microbiota, microbiome, High-Throughput Nucleotide Sequencing, Computational Biology, Computation Theory and Mathematics, power analysis, Review, Biological Sciences, simulation, Bioinformatics and computational biology, methods assessment, Biochemistry and cell biology, Genetics, Humans, methods selection, Computer Simulation, Microbiome, Biochemistry and Cell Biology, Other Information and Computing Sciences, Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
