Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

Data and code for 'Pseudogenes as a neutral reference for detecting selection in prokaryotic pangenomes'

Authors: Douglas, Gavin M;

Data and code for 'Pseudogenes as a neutral reference for detecting selection in prokaryotic pangenomes'

Abstract

This repository contains the code and files for reproducing the analyses and results reported in 'Pseudogenes as a neutral reference for detecting selection in prokaryotic pangenomes' by Gavin M. Douglas, W. Ford Doolittle, and B. Jesse Shapiro. File organization and descriptions: code/ - Contains GitHub repository releases of code used in manuscript (the other folders contain datafiles only). This code is provided here as well as on GitHub to ensure long-term access. handy_pop_gen-1.1.0/ - release v1.1.0 of the convenience repository (used for specific data processing and analysis steps referred to in the manuscript). pangenome_pseudogene_null-1.0.0/ - Main code repository for manuscript. broad_pangenome_analysis/ element_info/element_counts.tsv.gz - Counts of (filtered) pseudogenes and intact genes called per genome accession. element_info/gene_sizes.tsv.gz - Gene sizes in base-pairs. element_info/pseudogene_sizes.tsv.gz - Filtered pseudogene sizes in base-pairs. element_info/element_percent_coverage/*tsv.gz - Tables containing the percent genome coverage of genes and pseudogenes, by accession and averaged over accessions per species separately. example_Mycoplasmopsis_bovis_panaroo_output.csv.gz - Panaroo output table for Mycoplasmopsis bovis, which was used for an example. Corresponds to the gene_presence_absence.csv file in the raw Panaroo output. focal_and_non.focal_full_to_short.tsv.gz - Mapfile of full to short (and unique) species ids used in analysis. Primarily to include species ids in cluster names without making them unnecessarily long. genome_info/accessions.tsv.gz - Genome accessions used for broad pangenome analysis (note that not all genome accessions could be downloaded [and were ignored], which is indicated in the "could_download" column). genome_info/genome_sizes.tsv.gz - Sizes of all genomes used for the broad pangenome analysis. model_output/pangenome_linear_models.rds - R Data Serialization files containing the output of R linear model objects (generated by lm and provided as an R list object). There are separate elements in the list for the mean number of genes, genomic fluidity, percentage singletons (si), and si/sp. model_output/linear_model_coef.tsv.gz - Coefficient summary table for all linear models. pangenome_and_related_metrics.tsv.gz - Metrics used for broad pangenome analysis across 670 prokaryotic species. Note that this table was filtered down to 668 species after excluding those with < 9 genomes. pangenome_and_related_metrics_filt.tsv.gz - Filtered table, as described above. taxonomy.tsv.gz - Taxonomy for all species used for this analysis, taken from GTDB. Row names are species names. indepth_10_species_analysis/ cluster_breakdown_tables/ - Folder containing tables providing breakdown of how clusters are distributed by element type, pangenome partition, and species. Provided for easy plotting. cluster_member_breakdown.tsv.gz - Table providing information on each element (called pseudogenes and intact genes) and provides information such as what cluster they are part of, what species and genome accession they are found in, etc. cluster_types.rds - R Data Serialization file containing R list providing breakdown of all clusters into categories (intact/pseudogene/mixed, where mixed means containing both pseudogene and intact elements). COG_enrichment_results/ultra.cloud-COG-gene-enrichments.tsv.gz - Output file with enrichment test summaries for COG IDs in significant COG categories, which was run for the ultra-cloud pangenome partition model only. element_glmm_input.tsv.gz - Table containing all information used for fitting generalized linear mixed models. focal_species.txt - Names of species used for the in-depth analysis. genome_info/ - Folder containing the genome accessions (and the corresponding genome sizes) for all ten analyzed species. glmm_output/ - Folder containing R Data Serialization files containing output R objects after fitting generalized linear mixed models (only ultra-rare files are present, due to file size constraints). per_genome_element.type_percent_coverages.rds - R Data Serialization file containing R list providing the percent coverage by intact genes vs pseudogenes per accession (nested by species)

Related Organizations
Keywords

prokaryotes, mobile genes, pangenome, evolution, mobilome, pseudogenes, horizontal gene transfer, adaptation, bacteria, lateral gene transfer

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 32
    download downloads 14
  • 32
    views
    14
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
32
14
Related to Research communities