Phylogenetic Stochastic Mapping Without Matrix Exponentiation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Sep 2014Embargo end date: 01 Jan 2014 United States English Publisher:SAGE PublicationsJournal:Journal of Computational Biology, volume 21, pages 676-690 (issn: 1066-5277, eissn: 1557-8666,

Copyright policy )Funded by:NIH | Predoctoral Research Trai..., NIH | BAYESIAN MODELING AND DAT..., NSF | New statistical methods f...

Authors: Jan Irvahn; Vladimir N. Minin;

doi: 10.1089/cmb.2014.0062 , 10.48550/arxiv.1403.5040

pmid: 24918812

pmc: PMC4148059

arXiv: 1403.5040

Phylogenetic Stochastic Mapping Without Matrix Exponentiation

- Summary
- Subjects
- Metrics

Abstract

Phylogenetic stochastic mapping is a method for reconstructing the history of trait changes on a phylogenetic tree relating species/organisms carrying the trait. State-of-the-art methods assume that the trait evolves according to a continuous-time Markov chain (CTMC) and work well for small state spaces. The computations slow down considerably for larger state spaces (e.g. space of codons), because current methodology relies on exponentiating CTMC infinitesimal rate matrices -- an operation whose computational complexity grows as the size of the CTMC state space cubed. In this work, we introduce a new approach, based on a CTMC technique called uniformization, that does not use matrix exponentiation for phylogenetic stochastic mapping. Our method is based on a new Markov chain Monte Carlo (MCMC) algorithm that targets the distribution of trait histories conditional on the trait data observed at the tips of the tree. The computational complexity of our MCMC method grows as the size of the CTMC state space squared. Moreover, in contrast to competing matrix exponentiation methods, if the rate matrix is sparse, we can leverage this sparsity and increase the computational efficiency of our algorithm further. Using simulated data, we illustrate advantages of our MCMC algorithm and investigate how large the state space needs to be for our method to outperform matrix exponentiation approaches. We show that even on the moderately large state space of codons our MCMC method can be significantly faster than currently used matrix exponentiation methods.

33 pages, including appendices

Country

United States

Related Organizations

University of California, San Francisco
United States
UNIVERSITY OF WASHINGTON
University of Washington
United States
UNIVERSITY OF WASHINGTON
United States
Washington State University
United States

View all View all

Keywords

FOS: Computer and information sciences, codon models, MCMC, Evolution, q-bio.PE, Bioinformatics, Mathematical sciences, Statistics - Computation, Mathematical Sciences, Evolution, Molecular, Genetic, Models, Information and Computing Sciences, evolution, Computer Simulation, Poisson Distribution, Quantitative Biology - Populations and Evolution, Phylogeny, Computation (stat.CO), stat.CO, Models, Genetic, Applied Mathematics, Populations and Evolution (q-bio.PE), Molecular, Proteins, Biological Sciences, uniformization, Markov Chains, Biological sciences, FOS: Biological sciences, Information and computing sciences, Monte Carlo Method, Algorithms, data augmentation

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average