Supplementary Data: Ufboot2: Improving The Ultrafast Bootstrap Approximation

Supplementary Data UFBoot2: Improving the Ultrafast Bootstrap Approximation doi: https://doi.org/10.1101/153916 http://www.biorxiv.org/content/early/2017/06/22/153916 This record contains PANDIT based dataset and TreeBASE dataset (Nguyen et al. 2015) which are analyzed by different bootstrap methods in the study "UFBoot2: Improving the Ultrafast Bootstrap Approximation". The PANDIT based dataset (compressed in file data_pandit.tar.gz) is used to benchmark the accuracy of bootstrap estimates. The TreeBASE dataset (compressed in file data_treebase.tar.gz) is used to benchmark runtimes. After being uncompressed, the PANDIT based dataset comprises: 5,690 numbered directories corresponding to 5,690 DNA MSAs simulated by Seq-Gen (Rambaut and Grass 1997), where the model parameters and true tree were inferred from the original MSAs downloaded from the PANDIT database (Whelan et al. 2006). Note that the numbering of these directories is not consecutive because we kept only MSAs that can be tested under the mild and severe model violations as defined in the UFBoot paper (Minh et al. 2013). In each numbered directory N, there are three files: (1) data.N contains the simulated MSA in PHYLIP format; (2) model.N contains the best-fit model detected from the corresponding original MSA; (3) tree.N contains the tree (in Newick format) inferred from the corresponding original MSA. tree.N and model.N are used by Seq-Gen to simulate the MSA in data.N. After being uncompressed, the TreeBASE dataset comprises 115 files corresponding to 115 MSAs. There are: 70 DNA MSAs in PHYLIP format. These files follow the naming scheme dna_[number of sequences]_[number of sites].phy. 45 protein MSAs in PHYLIP format. These files follow the naming scheme prot_[number of sequences]_[number of sites].phy.

{"references": ["Minh BQ, Nguyen MAT, von Haeseler A. 2013. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30:1188\u20131195.", "Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32:268\u2013274.", "Rambaut A, Grass NC. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 13:235\u2013238.", "Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N. 2006. PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res. 34:D327\u2013D331."]}

Related Organizations

Medical University of Vienna
Austria
Ho Chi Minh City University of Technology and Education
Viet Nam
University of Vienna
Austria
Max F. Perutz Laboratories
Austria
Ho Chi Minh City University of Technology
Viet Nam

Keywords

phylogenetic inference, maximum likelihood, ultrafast bootstrap, model violation

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	38
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Usage byUsageCounts

visibility

views

45

45
views
Powered by

Found an issue? Give us feedback

visibility

38

Top 10%

45