Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Paired datasets to study alternative splicing regulation by individual RNA-binding proteins

Authors: Pedro Barbosa;

Paired datasets to study alternative splicing regulation by individual RNA-binding proteins

Abstract

This project stores datasets generated to study the regulation of alternative splicing using deep learning models (e.g., SpliceAI). In particular, these datasets were used to perform ablation studies (sequence perturbations at motif locations) to evaluate their effects on the deep learning model. I used public RNA-Seq data from the ENCODE consortium to identify exons sensitive to the knockdown of RNA-binding proteins (RBPs). The idea is that exons sensitive to RBP knockdowns are more likely to be directly or indirectly regulated by such RBPs, hence providing hints on their regulation mechanisms. Importantly, I also generated paired control exons, which were not alternatively spliced upon RBP knockdown but have similar GC composition and length compared to the knockdown-sensitive exons (target exon and surrounding introns). These control sets were generated to account for potential confounding factors of gene architecture features and, therefore, focus only on RBP binding motifs and their regulatory logic. Information about the files After uncompressing the 'paired_dataset.tar.gz' file, a directory with multiple files will be created with the following structure: 0_rMATS_ES_events.tsv.gz: Summary tables of differential splicing analysis, with deltaPSI estimates referring to Ctrl - Knockdown groups. Important columns: 'target_coordinates' refers to the 1-based coordinates of the alternatively spliced exon, and 'group' indicates the individual knockdown experiments where the exon was observed to be alternatively spliced. 0_rMATs_ES_non_changing_events.tsv.gz: Summary tables of differential splicing analysis, but in this case, contains all non-changing events (dPSI |0.1|, using a False Discovery Rate cutoff of 0.05. Non-changing events, assumed as knockdown-agnostic controls, were defined as those exhibiting negligible deltaPSI variation (< |0.025|). To ensure the high quality of the exon sets, further analytical steps were performed. First, I applied a read coverage filter, by retaining events where the median coverage across replicates per condition for the isoform with more read counts was higher than 7. Then, I exclusively focused on exon skipping events in protein-coding genes, and filtered out unannotated exons (pseudoexons) as well as first or last exons of genes. In addition, I excluded duplicate exon skipping events by picking the transcript with the highest biological importance (based on the presence of transcript flags such as MANE selected, CCDS, or APPRIS). A total of 15,235 events were detected across all RBP knockdown experiments (N=72, splicing-associated RBPs with data available for the HepG2 cell line), covering 6,659 unique exons.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average