Merging Microsatellite Data

descriptionPublicationkeyboard_double_arrow_right Article 01 Jul 2006 English Publisher:SAGE PublicationsJournal:Journal of Computational Biology, volume 13, pages 1,131-1,147 (issn: 1066-5277, eissn: 1557-8666,

Copyright policy )

Authors: Angela P. Presson; Eric M. Sobel; Kenneth Lange; Jeanette C. Papp;

doi: 10.1089/cmb.2006.13.1131

pmid: 16901233

Merging Microsatellite Data

- Summary
- Subjects
- Metrics

Abstract

Genotype calling procedures vary from laboratory to laboratory for many microsatellite markers. Even within the same laboratory, application of different experimental protocols often leads to ambiguities. The impact of these ambiguities ranges from irksome to devastating. Resolving the ambiguities can increase effective sample size and preserve evidence in favor of disease-marker associations. Because different data sets may contain different numbers of alleles, merging is unfortunately not a simple process of matching alleles one to one. Merging data sets manually is difficult, time-consuming, and error-prone due to differences in genotyping hardware, binning methods, molecular weight standards, and curve fitting algorithms. Merging is particularly difficult if few or no samples occur in common, or if samples are drawn from ethnic groups with widely varying allele frequencies. It is dangerous to align alleles simply by adding a constant number of base pairs to the alleles of one of the data sets. To address these issues, we have developed a Bayesian model and a Markov chain Monte Carlo (MCMC) algorithm for sampling the posterior distribution under the model. Our computer program, MicroMerge, implements the algorithm and almost always accurately and efficiently finds the most likely correct alignment. Common allele frequencies across laboratories in the same ethnic group are the single most important cue in the model. MicroMerge computes the allelic alignments with the greatest posterior probabilities under several merging options. It also reports when data sets cannot be confidently merged. These features are emphasized in our analysis of simulated and real data.

Related Organizations

University of California, Los Angeles
United States

Keywords

Genetic Markers, Models, Statistical, Genotype, Models, Genetic, Bayes Theorem, Markov Chains, Gene Frequency, Monte Carlo Method, Algorithms, Alleles, Software, Microsatellite Repeats

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%