
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>This project stores datasets generated to study the regulation of alternative splicing using deep learning models (e.g., SpliceAI). In particular, these datasets were used to perform ablation studies (sequence perturbations at motif locations) to evaluate their effects on the deep learning model. I used public RNA-Seq data from the ENCODE consortium to identify exons sensitive to the knockdown of RNA-binding proteins (RBPs). The idea is that exons sensitive to RBP knockdowns are more likely to be directly or indirectly regulated by such RBPs, hence providing hints on their regulation mechanisms. Importantly, I also generated paired control exons, which were not alternatively spliced upon RBP knockdown but have similar GC composition and length compared to the knockdown-sensitive exons (target exon and surrounding introns). These control sets were generated to account for potential confounding factors of gene architecture features and, therefore, focus only on RBP binding motifs and their regulatory logic. Information about the files After uncompressing the 'paired_dataset.tar.gz' file, a directory with multiple files will be created with the following structure: 0_rMATS_ES_events.tsv.gz: Summary tables of differential splicing analysis, with deltaPSI estimates referring to Ctrl - Knockdown groups. Important columns: 'target_coordinates' refers to the 1-based coordinates of the alternatively spliced exon, and 'group' indicates the individual knockdown experiments where the exon was observed to be alternatively spliced. 0_rMATs_ES_non_changing_events.tsv.gz: Summary tables of differential splicing analysis, but in this case, contains all non-changing events (dPSI |0.1|, using a False Discovery Rate cutoff of 0.05. Non-changing events, assumed as knockdown-agnostic controls, were defined as those exhibiting negligible deltaPSI variation (< |0.025|). To ensure the high quality of the exon sets, further analytical steps were performed. First, I applied a read coverage filter, by retaining events where the median coverage across replicates per condition for the isoform with more read counts was higher than 7. Then, I exclusively focused on exon skipping events in protein-coding genes, and filtered out unannotated exons (pseudoexons) as well as first or last exons of genes. In addition, I excluded duplicate exon skipping events by picking the transcript with the highest biological importance (based on the presence of transcript flags such as MANE selected, CCDS, or APPRIS). A total of 15,235 events were detected across all RBP knockdown experiments (N=72, splicing-associated RBPs with data available for the HepG2 cell line), covering 6,659 unique exons.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
