
This repository contains genome-wide association study (GWAS) summary statistics used in the paper: “Domain-Aware Matrix Completion for Phenotype Imputation Using Electronic Health Record Data with Applications in Genomic Research”(Annals of Applied Statistics). The repository includes GWAS summary statistics derived from the UK Biobank for five phenotypes: severe depression (MDD) breast cancer (BrCa) prostate cancer (PrCa) high blood pressure (HTN) bowel cancer (CRC) For each phenotype, GWAS summary statistics are provided for five analysis approaches: COVV3C: covImpute LTPI: LTPI SOFT: softImpute AUTO: autoComplete GWAS: GWAS based on observed case-control status In total, the repository contains 25 GWAS summary statistic files. These summary statistics were used in the real-data analyses to compare the imputation performance of matrix completion methods and to evaluate their impact on downstream genomic analyses. Each file contains SNP-level GWAS summary statistics with the following columns: CHROM: chromosome (hg19) GENPOS: genomic position (hg19) ID: SNP identifier in the format CHR-GENPOS-ALLELE0-ALLELE1 ALLELE1: effect allele ALLELE0: non-effect allele BETA: effect size estimate SE: standard error LOG10P: p-value on the log10 scale N: sample size TEST: REGENIE output field indicating the analysis type All variants are aligned to the hg19 genome build used in the analyses described in the paper. The file names follow the convention: _.txt.gz Examples include: Severe_depression_COVV3C.txt.gz Breast_cancer_LTPI.txt.gz Prostate_cancer_SOFT.txt.gz High_blood_pressure_AUTO.txt.gz Bowel_cancer_GWAS.txt.gz
GWAS, covImpute
GWAS, covImpute
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
