Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2023
License: CC BY
Data sources: ZENODO
ZENODO
Other ORP type . 2023
License: CC BY
Data sources: Datacite
ZENODO
Other ORP type . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Reference genome choice and filtering thresholds jointly influence phylogenomic analyses

Authors: Rick, Jessica; Brock, Chad; Lewanski, Alexander; Golcher-Benavides, Jimena; Wagner, Catherine;

Reference genome choice and filtering thresholds jointly influence phylogenomic analyses

Abstract

Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant-calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate to what extent the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count > 3–4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with minor allele count > 1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short read genomic data for phylogenetic inference.

Funding provided by: National Science FoundationCrossref Funder Registry ID: https://ror.org/021nxhr62Award Number: DEB-1556963 Funding provided by: National Institute of General Medical SciencesCrossref Funder Registry ID: https://ror.org/04q48ey07Award Number: 2P20GM103432 Funding provided by: National Aeronautics and Space AdministrationCrossref Funder Registry ID: https://ror.org/027ka1x80Award Number: NNX15AI08H

Data and supplementary material here are associated with the manuscript "Reference genome choice and filtering thresholds jointly influence phylogenetic analyses". Scripts can be found at https://github.com/jessicarick/refbias_scripts, and are archived on Zenodo at https://doi.org/10.5281/zenodo.5940690.

Related Organizations
Keywords

Phylogenetics, FOS: Computer and information sciences, imbalance, Bioinformatics, FOS: Biological sciences, Genetics, Macroevolution, Phylogenomics, diversification rate, minor allele frequency, reference genome, Ecology, Evolution, Behavior and Systematics, RAD phylogenomics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average