Assessing the fit of the multi-species network coalescent to multi-locus data

descriptionPublicationkeyboard_double_arrow_right Article 07 Dec 2020 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 37, pages 634-641 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Ruoyi Cai; Cécile Ané;

doi: 10.1093/bioinformatics/btaa863

pmid: 33027508

Assessing the fit of the multi-species network coalescent to multi-locus data

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network. Results We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network. Availability and implementation Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl. Supplementary information Supplementary data are available at Bioinformatics online.

Related Organizations

University of Wisconsin–Madison
United States
University of Wisconsin–Oshkosh
United States

Keywords

Likelihood Functions, Genome, High-Throughput Nucleotide Sequencing, Phylogeny, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	19
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

19

Top 10%

Average

Top 10%

gold

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all