
As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems.We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (lambda = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms.IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
Genotyping, Genotype, Science, Genome-wide association studies, Polymorphism, Single Nucleotide, Research errors, Gene Frequency, Serial analysis of gene expression, Medicine and Health Sciences, Ethnicity, Humans, False Positive Reactions, Molecular genetics, Variant genotypes, African American people, Models, Statistical, Computers, Q, R, Reproducibility of Results, Haplotypes, Data Interpretation, Statistical, Medicine, Software, Research Article, Genome-Wide Association Study
Genotyping, Genotype, Science, Genome-wide association studies, Polymorphism, Single Nucleotide, Research errors, Gene Frequency, Serial analysis of gene expression, Medicine and Health Sciences, Ethnicity, Humans, False Positive Reactions, Molecular genetics, Variant genotypes, African American people, Models, Statistical, Computers, Q, R, Reproducibility of Results, Haplotypes, Data Interpretation, Statistical, Medicine, Software, Research Article, Genome-Wide Association Study
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 85 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
