
AbstractProtein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments. Proteins 2001;45:102–104. © 2001 Wiley‐Liss, Inc.
Databases, Factual, Sequence Homology, Amino Acid, Entropy, Molecular, Computational Biology, Proteins, Reproducibility of Results, Evolution, Molecular, Chemistry, Amino Acid Substitution, Health Sciences, Biochemistry and Biotechnology, Amino Acid Sequence, Cellular and Developmental Biology, Sequence Alignment, Software, Probability
Databases, Factual, Sequence Homology, Amino Acid, Entropy, Molecular, Computational Biology, Proteins, Reproducibility of Results, Evolution, Molecular, Chemistry, Amino Acid Substitution, Health Sciences, Biochemistry and Biotechnology, Amino Acid Sequence, Cellular and Developmental Biology, Sequence Alignment, Software, Probability
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 71 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
