
pmid: 1758884
AbstractWe propose the use of the information‐theoretical entropy, S = −Σpi log2 Pi, as a measure of variability at a given position in a set of aligned sequences. pi stands for the fraction of times the i‐th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and VS, a related measure, in detail with VK, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that VK lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that VK and the S‐based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log‐linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank‐frequency distribution obvious, although we discuss several possible etiologies.
Models, Statistical, Chemical Phenomena, Chemistry, Physical, Molecular Sequence Data, Information Theory, Genetic Variation, Humans, Immunoglobulins, Amino Acid Sequence, Sequence Alignment
Models, Statistical, Chemical Phenomena, Chemistry, Physical, Molecular Sequence Data, Information Theory, Genetic Variation, Humans, Immunoglobulins, Amino Acid Sequence, Sequence Alignment
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 182 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
