Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC 0
Data sources: ZENODO
DRYAD
Dataset . 2020
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

On the cross-population generalizability of gene expression prediction models

Authors: Keys, Kevin L.; Mak, Angel C.Y.; White, Marquitta J.; Eckalbar, Walter L.; Dahl, Andrew W.; Mefford, Joel; Mikhaylova, Anna V.; +13 Authors

On the cross-population generalizability of gene expression prediction models

Abstract

Data This dataset is linked to a manuscript. For a complete description of methods for how these data were produced, processed, and analyzed, see the preprint on bioRxiv here. This study contains three separate analyses, for which a summary is given below. Analysis of expression data from SAGE, a pediatric asthma cohort; Analysis of paired genotype-expression data from the GEUVADIS study; Simulated data using genotype data from the 1000 Genomes Project (1KGP) SAGE Genotype data from SAGE are available on dbGaP under ascension number phs000921.v1.p1. Expression data were processed in accordance with the GTEx v6p pipeline. Inverse quantile normalized expression values on 39 SAGE subjects are provided here. These data are stored in the file sage_39_wgs_for_rnaseq_expression_sorted_headered.bed.tar.gz. GEUVADIS Genome data for GEUVADIS were downloaded from the 1KGP data portal. Expression data from GEUVADIS were taken from the file GD462.GeneQuantRPKM.50FN.samplename.resk10.txt.gz, downloaded from the GEUVADIS data portal (originally at https://www.ebi.ac.uk/Tools/geuvadis-das/, but defunct as of May 2020; try the 1KGP page or the EBI page). Simulations from 1000 Genomes Simulations used haplotype data originally from 1KGP. Haplotype data were downloaded from the IMPUTE website (download link not working as of May 2020) and are provided here for completeness (see file "HM3.tgz"). Forward-simulated haplotypes from HAPGEN2 are also provided here in three archives: AA.chr22.tar.gz, CEU.chr22.tar.gz, and YRI.chr22.tar.gz. A list of genes from chromosome 22 is also included as chr22.genelist.txt. Results Results are separated by analysis. SAGE Results from analysis of SAGE are stored in the archive sage.results.tar.gz. It contains three files: sage.predixcan.all.gene.results.txt, which contains all R2 and correlation results from comparison of measured gene expression to predictions from PrediXcan; gtex7.compare.r2.txt, which contains the comparison of GTEx v7 training R2 versus empirical PrediXcan R2 in SAGE, illustrated in Figure 3 of Keys et al. (2020); sage_predixcan_allresults_allplots_2020-02-17.Rdata, an R data file with manuscript figures included. Plotting code is available on Github here. GEUVADIS Results from analysis of GEUVADIS data are split into three archives: geuvadis.numpred.tar.gz, which contains a count of the number of samples predicted for each gene in GEUVADIS analyses; geuvadis.predictionweights.tar.gz, which contains the prediction weights produced by the PrediXcan pipeline applied to GEUVADIS populations; geuvadis.results.tar.gz, which contains the comparisons between measurements and predictions in GEUVADIS subpopulations (see Tables 1-4, Supplementary Tables 5 and 8, and Supplementary Figures 13-17 of Keys et al. (2020)). 1000 Genomes Simulations Gene expression prediction and TWAS association testing results from analysis of simulated cross-population prediction with 1KGP populations is stored in the single archive 1000genomes-simulation.results.tar.gz. See Figures 4-7, Supplementary Tables 6-7 and 9-12, and Supplementary Figures 20-23 of Keys et al. (2020). Analysis and plotting code is on Github.

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.

All source code for this project can be found on Github here. Genotype data are stored on dbGaP under ascension number phs000921.v4.p1.

Keywords

PrediXcan, TWAS, GTEx, admixed populations

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 10
    download downloads 2
  • 10
    views
    2
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
10
2