Parameter for convolutional neural networks predicting log-transformed counts of ATAC-seq peaks from Calderon et al. 2019

Trained model parameters generated by the example scripts at https://github.com/mostafavilabuw/Calderon2019ATACmodel/ . Models were trained, assessed, and interpreted on ATAC-seq data from Calderon et al. 2019. The data count matrix with aligned ATAC-seq Tn5 cuts can be downloaded from the Gene Expression Omnibus (GEO: GSE118189) (Calderon et al. 2019). The data matrix contains 175 measurements for 829,942 genomic regions (peaks), spanning 45 unique human immune cell states from eleven different donors. The 45 immune cell states cover 25 resting cell types of which 20 are also included as stimulated cell states (Immature NK, Memory NK, Myeloid DC, Plasmablasts, and pDC don’t have a measured stimulated state). Follow steps in data_processing.sh to normalize the count matrix, i.e. taking the log2 of the counts, quantile normalize all samples, and compute the mean expression for all cell states across donors. Since our models usually focus on predicting differences between cell types, we exclude irreproducible ATAC peaks and only train our models on peaks that are not affected by a donor’s genotype or trans-actin factors. To select reprocible peaks, we compute the correlation between two sets of different donors across all 45 cell types for each peak for 100 different pairs of donor sets, and only keep peaks with a consistent positive correlation allowing for a false discovery rate of 0.1% (T-test, n=100, Bejamini-Hochberg correction). CNN models were trained as described in train_cnns.sh with code from the code repository in https://github.com/LXsasse/DRG/ Model performance was analyzed as described in assess_model.sh. Gene regulatory grammar was identified as described in interpret_cnn.sh Variant effect predictions can be performed as described in pred_cnn.sh Parameters of the trained models can be downloaded and be loaded into the model for global analysis or sequence analysis as described in interpret_cnn.sh and pred_cnn.sh.

Related Organizations

University of Mary
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average