Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2019
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2019
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Simulated NGS read datasets for bacterial pathogenic potential prediction

Authors: Bartoszewicz, Jakub M.; Seidel, Anja; Rentzsch, Robert; Renard, Bernhard Y.;

Simulated NGS read datasets for bacterial pathogenic potential prediction

Abstract

## Predicting pathogenic potentials from NGS reads: novel bacterial species This repository contains simulated Illumina read datasets for bacterial pathogenic potential prediction and associated metadata extracted from the IMG Database (https://img.jgi.doe.gov/). The reads are 250bp long and were simulated with Mason (https://www.seqan.de/apps/mason/) from genomes downloaded from NCBI. The training-validation-test split was done on the species level to ensure "novelty" of validation and test species. The training sets contain 10 million reads per class, validation sets - 1.25 million reads per class, and test sets - 1.25 million paired reads per class. Additional, imbalanced training sets contain 2.5 million "nonpathogenic" and 17.5 million "pathogenic" reads, keeping the mean covarage constant for all species. The temporal benchmark test set contains reads from 3 additional pathogenic species in the Pantoea genus. ## Predicting pathogenic potentials from NGS reads: novel strains of known species The BacPaCS datasets contain reads simulated from the dataset compiled by Barash et al. (https://doi.org/10.1093/bioinformatics/bty928). It this case, the training-validation-test split was done on the strain level (so different strains of the same species may be present in all three sets).

{"references": ["Barash, E. et al. (2019), BacPaCS\u2014Bacterial Pathogenicity Classification via Sparse-SVM. Bioinformatics, 35(12), 2001\u20132008", "Chen, I.-M. A. et al. (2019). IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Research, 47(D1), D666\u2013D677."]}

Related Organizations
Keywords

FOS: Computer and information sciences, Deep Learning, Bioinformatics, NGS, Pathogenicity, Synthetic Biology, Prediction

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 70
    download downloads 80
  • 70
    views
    80
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
70
80