Domain prediction with probabilistic directional context

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 14 Dec 2016Publisher:openRxivJournal:Bioinformatics, volume 33, pages 2,471-2,478 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )Funded by:NSF | Collaborative Research: A..., NIH | Predicting and analyzing ..., NSF | GRADUATE RESEARCH FELLOWS... +1 projects

Authors: Ochoa, Alejandro; Singh, Mona;

doi: 10.1101/094284 , 10.1093/bioinformatics/btx221

pmid: 28407137

pmc: PMC5870623

Domain prediction with probabilistic directional context

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. While domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by rewarding domain pairs that frequently co-occur within sequences. However, most of these approaches have ignored the order in which domains preferentially co-occur and have also not modeled domain co-occurrence probabilistically. Results We introduce a probabilistic approach for domain prediction that models “directional” domain context. Our method is the first to score all domain pairs within a sequence while taking their order into account, even for non-sequential domains. We show that our approach extends a previous Markov model-based approach to additionally score all pairwise terms, and that it can be interpreted within the context of Markov random fields. We formulate our underlying combinatorial optimization problem as an integer linear program, and demonstrate that it can be solved quickly in practice. Finally, we perform extensive evaluation of domain context methods and demonstrate that incorporating context increases the number of domain predictions by ∼15%, with our approach dPUC2 (Domain Prediction Using Context) outperforming all competing approaches. Availability dPUC2 is available at http://github.com/alexviiia/dpuc2 .

Related Organizations

PRINCETON UNIVERSITY
Princeton University
United States
Lewis-Sigler Institute for Integrative Genom Department of Lewis-Sigler Princeton University
United States
College of New Jersey
United States
Department of Computer Science Princeton University
United States

Keywords

Models, Molecular, Models, Statistical, Protein Domains, Sequence Analysis, Protein, Computational Biology, Humans, Original Papers, Algorithms, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average