Unbiased probabilistic taxonomic classification for DNA barcoding

descriptionPublicationkeyboard_double_arrow_right Article 13 Jun 2016 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 32, pages 2,920-2,927 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Panu Somervuo; Sonja Koskela; Juho Pennanen; R. Henrik Nilsson; Otso Ovaskainen;

doi: 10.1093/bioinformatics/btw346

pmid: 27296980

Unbiased probabilistic taxonomic classification for DNA barcoding

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation: When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors. Results: We demonstrate the performance of PROTAX by using as predictors the output from BLAST, the phylogenetic classification software TIPP, and the RDP classifier. We show that PROTAX improves the predictions of the baseline implementations of TIPP and RDP classifiers, and that it is able to combine complementary information provided by BLAST and TIPP, resulting in accurate and unbiased classifications even with very challenging cases such as 50% mislabeling of reference sequences. Availability and implementation: Perl/R implementation of PROTAX is available at http://www.helsinki.fi/science/metapop/Software.htm. Contact: panu.somervuo@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online.

Related Organizations

Norwegian University of Science and Technology
Norway
University of Gothenburg
Sweden
University of Helsinki
Finland

Keywords

DNA Barcoding, Taxonomic, Phylogeny, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	87
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%