<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Simulated NGS read datasets for novel human virus prediction

Name: Simulated NGS read datasets for novel human virus prediction
Keywords: 3. Good health

Research datakeyboard_double_arrow_right Dataset 09 Dec 2020Publisher:Zenodo

Authors: Bartoszewicz, Jakub M.; Seidel, Anja; Renard, Bernhard Y.;

doi: 10.5281/zenodo.4312525 , 10.5281/zenodo.4312152 , 10.5281/zenodo.3630803

Simulated NGS read datasets for novel human virus prediction

- Summary
- Related research
  (6)
- Metrics

Abstract

This repository contains simulated Illumina read datasets for novel human virus prediction and associated metadata extracted from the Virus Host Database (https://www.genome.jp/virushostdb/). The reads are 250bp long and were simulated with Mason (https://www.seqan.de/apps/mason/) from genomes downloaded from NCBI. The training-validation-test split was done on whole viral sequences to ensure "novelty" of validation and test viruses. The training sets contain 10 million reads per class, validation sets - 1.25 million reads per class, and test sets - 1.25 million paired reads per class. The negative class sets contain reads simulated from chordate-infecting ("cho"), metazoan-infecting ("met"), eukariote-infecting ("euk") and all-nonhuman viruses. The positive class contains human-infecting viruses. The stratified dataset ("strat") contains an equal number of reads from "cho", "met but not cho", "euk but not met" and "all but not euk". Species-level datasets ("humspec", "allspec" and "chospec", with the corresponding fasta and *_species.rds files) are constructed analogously, but ensuring that all viruses of a given species were assigned to either training, val or test set. This is a stricter setting modelling a "novel viral species" scenario while reflecting within-species phenotype diversity. blast_hits.gz contains blast hits of human virome reads form Moustafa et al., 2017 (https://doi.org/10.1371/journal.ppat.1006292) blasted against our training database (see paper for details). In the second column you can find the matched label and the accession number of the matched reference. blast_labels_complete.gz contains extracted labels for all virome reads, including those without any matches. Note: one of the read headers (>3c8ac47039d32b11c8fe23f588e444e9) from Moustafa et al. is slightly corrupted with null characters. You can remove them with sed 's/\x0//g' or equivalent.

Related Organizations

Freie Universität Berlin
Germany
Hasso Plattner Institute
Germany

Filter by relation

All relations

arrow_drop_down

6 Research products, page 1 of 1

Synthesis and Antioxidant Activities of [5-fluoro N, N'-bis (salicylidene) ethylenediamine] and [3, 5-fluoro N, N'-bis (salicylidene) ethylenediamine] Manganese (III) Complexes
2013IsAmongTopNSimilarDocuments
The association of arterial stiffness with estimated excretion levels of urinary sodium and potassium and their ratio in Chinese adults
2022IsAmongTopNSimilarDocuments
Distributions of picophytoplankton and phytoplankton pigments along a salinity gradient in the Changjiang River Estuary, China
2014IsAmongTopNSimilarDocuments
Dynamics of photosynthetic picoplankton in a subtropical estuary and adjacent shelf waters
2009IsAmongTopNSimilarDocuments
Simulated NGS read datasets for novel human virus prediction
2020HasVersion
Ameliorative Action of Mn-Salen Derivatives on CCl<sub>4</sub>-Induced Destructive Effects and Lipofuscin-Like Pigment Formation in Rats’ Liver and Brain: Post-Treatment of Young Rats with EUKs
2014IsAmongTopNSimilarDocuments

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	53
download	downloads	87

53
views
87
downloads
Powered by

Found an issue? Give us feedback

visibility

download

Average

Beta

SDGs Suggest

3. Good health

Beta

SDGs:

3. Good health,

Related to Research communities

Knowmad Institut

Simulated NGS read datasets for novel human virus prediction

Simulated NGS read datasets for novel human virus prediction

6 Research products, page 1 of 1

Synthesis and Antioxidant Activities of [5-fluoro N, N'-bis (salicylidene) ethylenediamine] and [3, 5-fluoro N, N'-bis (salicylidene) ethylenediamine] Manganese (III) Complexes

The association of arterial stiffness with estimated excretion levels of urinary sodium and potassium and their ratio in Chinese adults

Distributions of picophytoplankton and phytoplankton pigments along a salinity gradient in the Changjiang River Estuary, China

Dynamics of photosynthetic picoplankton in a subtropical estuary and adjacent shelf waters

Simulated NGS read datasets for novel human virus prediction

Ameliorative Action of Mn-Salen Derivatives on CCl&lt;sub&gt;4&lt;/sub&gt;-Induced Destructive Effects and Lipofuscin-Like Pigment Formation in Rats’ Liver and Brain: Post-Treatment of Young Rats with EUKs

Ameliorative Action of Mn-Salen Derivatives on CCl<sub>4</sub>-Induced Destructive Effects and Lipofuscin-Like Pigment Formation in Rats’ Liver and Brain: Post-Treatment of Young Rats with EUKs