
Microeukaryotic protein database consisting of protists and fungi for VEBA. Number of sequences: * MicroEuk100 = 79,920,431 (19 GB) * MicroEuk90 = 51,767,730 (13 GB) * MicroEuk50 = 29,898,853 (6.5 GB) Number of source organisms per dataset: * MycoCosm = 2503 * PhycoCosm = 174 * EnsemblProtists = 233 * MMETSP = 759 * TARA_SAGv1 = 8 * EukProt = 366 * EukZoo = 27 * TARA_SMAGv1 = 389 * NR_Protists-Fungi = 48217 Files: MicroEuk_v3.tar.gz = 25 GB -rw-rw---- 1 jespinoz jcl110 19G Nov 15 14:57 MicroEuk100.faa.gz - Main fasta file with 79,920,431 protein sequences from 52,676 source organisms. Uses md5 hash identifiers. -rw-rw---- 1 jespinoz jcl110 2.0G Nov 15 14:59 identifier_mapping.proteins.tsv.gz - Protein identifier mappings between datasets, original identifiers, source organisms, and md5 hash identifiers. -rw-rw---- 1 jespinoz jcl110 1.7G Nov 15 16:10 MicroEuk90_clusters.tsv.gz - MMSEQS2 clustering MicroEuk100 -rw-rw---- 1 jespinoz jcl110 1.5G Nov 15 14:57 MicroEuk100.list.gz - List of md5 hash protein identifiers in MicroEuk100 -rw-rw---- 1 jespinoz jcl110 1.1G Nov 15 16:10 MicroEuk50_clusters.tsv.gz - MMSEQS2 clustering MicroEuk90 -rw-rw---- 1 jespinoz jcl110 13M Nov 15 23:39 MicroEuk100.eukaryota_odb10.list.gz - MicroEuk100 protein identifier hits to BUSCO's eukaryota_odb10 marker using the provided score thresholds -rw-rw---- 1 jespinoz jcl110 1.5M Nov 15 14:58 source_taxonomy.tsv.gz - Source taxonomy, lineage, dataset, and notes for each source organism For more information and citations, please visit the main GitHub repository: https://github.com/jolespin/veba
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
