Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Intergenic RNAPII Atlas : output data

Authors: de Langen, Pierre; Hammal, Fayrouz; Guéret, Elise; Mouren, Jean-Christophe; Spinelli, Lionel; Ballester, Benoit;

Intergenic RNAPII Atlas : output data

Abstract

This dataset represents the RNAPII (RNAP2) Atlas of potentially transcribed intergenic regions of the human genome by integrating 906 high quality human Chromatin-ImmunoPrecipitation sequencing (ChIP-seq) biosamples targeting the RNA Polymerase II, obtained from public data warehouses. Github Code available here : https://github.com/benoitballester/Pol2Atlas The dataset consists of 5 zipped folders described below: ./pol2_consensuses/: consensuses.bed: Location of intergenic RNAP2 consensuses in bed format for the hg38 assembly. First three columns are genomic locations, 4th column is consensus ID, 5th column is the number of datasets with RNAP2 observed at this RNAP2 consensus, 6th column is strand (not used), 7-8th columns is consensus centroid. consensusesHg19.bed: Location of intergenic RNAP2 consensuses in hg19 assembly. ~1000 are missing due to liftover. Consensus ID is matching with the hg38 one. matrix.mtx: RNAP2 occupancy consensus-dataset binary matrix in sparse matrix market format. Corresponding row annotation are RNAP2 consensuses. Corresponding column annotation are datasets stored in dataset.txt. datasets.txt: See matrix.mtx clusterConsensuses_Labels.txt: Assigned cluster for each RNAP2 consensus. intersectIntergPol2.tsv: RNAP2 consensuses with cluster ID and intersections with reference databases. cluster_bed/: consensuses.bed splitted per cluster. saf_files/: Files typically used for read counting with featureCounts. Suffixes: _500 : RNAP2 consensuses standardized to 1kbp. _all : All RNAP2 consensuses including genic. Hg19 : Intergenic RNAP2 Lifted to Hg19. ./rnap2_all_peaks/: all_peaks.bed.gz: Concatenated bed file with all POLR2A peaks from all experiments, genome wide, for the hg38 assembly. Peaks are filtered with a MACS2 qvalue > 1e-5, datasets with less than 100 peaks in intergenic regions are removed. First three columns are genomic locations, 4th column contains sample of origin of the peak, 5th column is the MACS2 q-value, 6th column is dna strand (not used), 7-8th are peak "summit". 9th column contains an r,g,b value corresponding to the biotype of origin (Blood / Immune, Brain, Embryo...) for easy visualization in a genome browser. Legend is available in legend.png. Conversion table between rgb values and biotype in palette.csv. Note that singletons are removed when creating consensus peaks. all_peaks_interg.bed.gz: Same as above, but for intergenic regions only (excluding 1kb before TSS and 1kb after TES). ./count_tables_rnaseq/: ENCODE/: counts.mtx.gz: Count table in sparse matrix market format. Row corresponds to samples, columns to Pol II probes (Pol2_500.saf). samples.csv.gz: Matching row annotation for count matrix. encode_total_rnaseq_annot.tsv.gz: Sample annotation (not ordered!). GTEx/: counts.mtx.gz: Count table in sparse matrix market format. Row corresponds to samples, columns to Pol II probes (Pol2_500.saf). samples.csv.gz: Matching row annotation for count matrix. sample_annot.tsv.gz: Sample annotation (not ordered!). TCGA/: counts.mtx.gz: Count table in sparse matrix market format. Row corresponds to samples, columns to Pol II probes (Pol2_500.saf). samples.csv.gz: Matching row annotation for count matrix. annotation_table.tsv.gz: Sample annotation (not ordered!). ./cancer_markers/: bed/: DE_Tumor_vs_Normal/: TCGA-*/: allWithStats.bed: FDR, mean difference in pearson residuals and log2(FC) for each RNAP2 probe. Warning: probes are prefiltered to have > 1 read in 3 samples, make sure to use row index to match with RNAP2 consensuses. allDE.bed: All DE (cancer vs normal) probes in bed format for this cancer. 5th column has been replaced by enrichment p-value. DE_downreg.bed: Downregulated (cancer vs normal) probes in bed format for this cancer. 5th column has been replaced by enrichment p-value. DE_upreg.bed: Upregulated (cancer vs normal) probes in bed format for this cancer. 5th column has been replaced by enrichment p-value. classifier_TCGA-*: Performance of a machine learning tumor-normal tissues classifier using Pol II probes as input. globally_DE.bed: Probes DE in 7+ cancers (FPR permutation threshold). Last column indicates the number of cancers this probe is DE in. globally_Down regulated.bed: Probes DE in 6+ cancers (FPR permutation threshold). Last column indicates the number of cancers this probe is Down regulated in. globally_Up regulated.bed: Probes DE in 5+ cancers (FPR permutation threshold). Last column indicates the number of cancers this probe is Up regulated in. subtypes/: BRCA/: allWithStats_BRCA.*.bed: FDR, mean difference in pearson residuals and log2(FC) for each RNAP2 probe for DE test of sample from this subtype against normal samples. Warning: probes are prefiltered to have > 1 read in 3 samples, make sure to use row index to match with RNAP2 consensuses. bed_BRCA.*.bed: All DE (subtype vs normal) probes in bed format for this cancer. bed_uniqueDE_BRCA.*.bed: All DE (subtype vs normal) probes in bed format for this cancer and not DE in any other subtype. TCGA_survival/: TCGA-*/: prognostic.bed: All probes associated with survival for this cancer. 5th column has been replaced by p-value. stats.csv: Cox linear model statistics for each Pol II probe. Warning: probes are prefiltered to have > 1 read in 3 samples, make sure to use row index to match with RNAP2 consensuses. globally_prognostic.bed: Probes associated with survival in 5+ cancers (FPR permutation threshold). 5th column has been replaced with the number of cancers this probe is associated with survival in. tabular/: Same as above but stored in a tabular binary format for DE and survival. ./metacluster_markers/: bed/: allPol2_datasetCount: For each tissue, all Pol II consensuses, with 5th column indicating the number of datasets (RNAP2, GTEx, ENCODE, TCGA tumour and normal) in which the RNAP2 consensus is considered a marker. robust_2_datasets_per_tissue: For each tissue, Pol II consensuses considered marker in 2+ datasets out of 5 (RNAP2, GTEx, ENCODE, TCGA tumour and normal). tabular/: Each Pol II consensus with marker information stored in a binary format.

Keywords

Intergenic, Enhancers, Cancer genomics, Genomics, RNA Polymerase II, Non-coding DNA, Gene regulation

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 44
    download downloads 4
  • 44
    views
    4
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
44
4
Related to Research communities
Cancer Research