A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules

Summary: Associated code, python notebooks, and data for the manuscript entitled 'A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules' (published in PLoS Computational Biology; doi: 10.1371/journal.pcbi.1010028). Stage 1 peaBrain: In Stage 1, we constructed a single model to predict the mean abundance of all genes in any given tissue from the reference genome, optionally annotated with epigenetic and genomic annotations. We applied this framework to all tissues from the GTEx dataset, constructing three classes of models: (a) using DNA sequence alone (class-A); (b) using DNA plus epigenomic annotations not specific to any tissue or cell type (i.e. non-specific annotations) (class-B); and (c) using DNA combined with both non-specific and tissue-specific annotations (class-C). We have provided all code and data necessary to generate the results for class-A and class-B models. Due to storage constraints, we provide training/test data only for skeletal muscle. Expression data for other tissues is available from GTEx. The original data sources used to train class-C models are detailed in the manuscript. Using the Stage 1 class-B models, we generated a non-coding impact metric that captured the impact of each position in the core promoter sequence on the expression of each gene. The peaBrain impact scores for all GTEx tissues have been made available. In the manuscript, we show that this impact score correlates with nucleotide evolutionary constraint and is also predictive of disease-associated variation and allele-specific transcription factor binding. We also highlight how tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. Stage 2 peaBrain: In Stage 2, we extended the peaBrain model to incorporate the transcriptomic consequences of individual genotype variation. In the manuscript, we describe the ability of this extended peaBrain model to predict the tissue-specific expression profile of each individual and to identify putatively functional variants within the sequence. Sample code has been provided. Individual level data is available from GTEx.

Related Organizations

University of Toronto
Canada
University of Oxford
United Kingdom

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average