Improved inference of tandem domain duplications

descriptionPublicationkeyboard_double_arrow_right Article 01 Jul 2021 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 37, pages i133-i141 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )Funded by:NSF | ABI: Innovation: Computat..., NIH | Predicting and analyzing ...

Authors: Chaitanya Aluru; Mona Singh 0001;

doi: 10.1093/bioinformatics/btab329

pmid: 34252920

pmc: PMC8275333

Improved inference of tandem domain duplications

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution. Results Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns. Availability and implementation Code is available on github at https://github.com/Singh-Lab/TandemDuplications. Supplementary information Supplementary data are available at Bioinformatics online.

Related Organizations

Keywords

Evolution, Molecular, Protein Domains, Gene Duplication, Evolutionary, Comparative and Population Genomics, Humans, Programming, Linear, Algorithms, Phylogeny

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average