
pmc: PMC11870528 , PMC12648238
Abstract Phylogenetic branch lengths are essential for many analyses, such as estimating divergence times, analyzing rate changes, and studying adaptation. However, true gene tree heterogeneity due to incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer can complicate the estimation of species tree branch lengths. While several tools exist for estimating the topology of a species tree addressing various causes of gene tree discordance, much less attention has been paid to branch length estimation on multi-locus datasets. For single-copy gene trees, some methods are available that summarize gene tree branch lengths onto a species tree, including coalescent-based methods that account for heterogeneity due to incomplete lineage sorting. However, no such branch length estimation method exists for multi-copy gene family trees that have evolved with gene duplication and loss. To address this gap, we introduce the CASTLES-Pro algorithm for estimating species tree branch lengths while accounting for both gene duplication and loss and incomplete lineage sorting. CASTLES-Pro improves on the existing coalescent-based branch length estimation method CASTLES by increasing its accuracy for single-copy gene trees and extending it to handle multi-copy ones. Our simulation studies show that CASTLES-Pro is generally more accurate than alternatives, eliminating the systematic bias toward overestimating terminal branch lengths often observed when using concatenation. Moreover, while not theoretically designed for horizontal gene transfer, we show that CASTLES-Pro is relatively robust to random horizontal gene transfer, though its accuracy can degrade at the highest levels of horizontal gene transfer.
Evolution, Molecular, species trees, Models, Genetic, gene duplication and loss, Gene Duplication, incomplete lineage sorting, Methods, branch length estimation, horizontal gene transfer, phylogenomics, Computer Simulation, Phylogeny, Algorithms
Evolution, Molecular, species trees, Models, Genetic, gene duplication and loss, Gene Duplication, incomplete lineage sorting, Methods, branch length estimation, horizontal gene transfer, phylogenomics, Computer Simulation, Phylogeny, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
