
This is a basic database consisting of 4634 genomes sampled from the RefSeq database using the Woltka pipelin. It is intended for testing and evaluation of metagenomic classification tools. Contents The genome sequences are provided in a compressed archive (genomes.tar.gz). When unpacked, the folder structure is organized by NCBI Taxonomy ID (taxid), like so: genomes/├── taxid1/│ ├── genome1_0.fna│ └── genome1_1.fna├── taxid2/│ └── genome2_0.fna├── taxid3/│ ├── genome3_0.fna│ ├── genome3_1.fna│ └── genome3_2.fna Each top-level directory corresponds to a taxonomic ID and contains one or more genome FASTA files in .fna format. Additional Files fold1_list.txt and fold1_testing_list.txt: Lists of genome TaxIDs used for training and testing, respectively. These are included to support reproducible benchmarking of metagenomic classifiers.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
