
doi: 10.2307/2532943
pmid: 7662844
Nucleotides in a DNA sequence may be changing at different rates, because they are located in different structural and functional regions of the gene, and are thus subject to different mutational pressures or selective restrictions. Knowledge of substitution rates at specific sites is important for understanding the forces and mechanisms that have shaped the evolution of the DNA sequences. The gamma distribution has previously been proposed to model such rate variation among nucleotide sites. Based on mixed model methodology we present in this paper a method for predicting substitution rates at nucleotide sites by using homologous DNA sequences. The predictor is unbiased and "best" in the sense that it minimizes the mean squared error and maximizes the correlation between the predictor and the true value. It is also quite robust to errors in estimates of parameters in the model. A numerical example is given, with guidelines for the practical use of the approach. The most influential factor affecting the accuracy of prediction is the number of sequences; to get a correlation of over .7 between the predictor and the true value, about six to seven sequences are needed, depending on the overall similarity of the sequences.
mixed models, Primates, best unbiased predictor, maximum likelihood method, evolution of DNA sequences, empirical Bayes estimation, spatial rate variation, DNA sequences, DNA, Mitochondrial, Applications of statistics to biology and medical sciences; meta analysis, Problems related to evolution, Animals, Humans, mean squared error, gamma distribution, Models, Statistical, Base Sequence, Models, Genetic, DNA, Protein sequences, DNA sequences, Biological Evolution, substitution rates at nucleotide sites, Mathematics
mixed models, Primates, best unbiased predictor, maximum likelihood method, evolution of DNA sequences, empirical Bayes estimation, spatial rate variation, DNA sequences, DNA, Mitochondrial, Applications of statistics to biology and medical sciences; meta analysis, Problems related to evolution, Animals, Humans, mean squared error, gamma distribution, Models, Statistical, Base Sequence, Models, Genetic, DNA, Protein sequences, DNA sequences, Biological Evolution, substitution rates at nucleotide sites, Mathematics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 60 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
