
A glimpse at parquet files (from R function dplyr::glimpse())pocpbenchmark_computing_metrics.parquetRows: 2,365,703 Columns: 9 $ Family "f__Acetobacteraceae", "f__Acetobactera… $ category "DIAMOND", "DIAMOND", "MMSEQS2", "BLAST… $ tool "DIAMOND_MAKEDB", "DIAMOND_MAKEDB", "MM… $ dataset_id "RS_GCF_017377735.1", "RS_GCF_014174155… $ realtime "1s", "0ms", "0ms", "1s", "0ms", "0ms",… $ `%cpu` "208.8%", "251.4%", "37.5%", "58.3%", "… $ peak_vmem "80.4 MB", "528.4 MB", "2 GB", "63.1 MB… $ wchar "1.6 MB", "1.6 MB", "3.4 MB", "1.9 MB",… $ rchar "1.6 MB", "1.7 MB", "3.1 MB", "1.7 MB",…pocpbenchmark_family_metadata.parquetRows: 35 Columns: 7 $ Family "f__Acetobacteraceae", "f__Amphiba… $ benchmark_type "full", "full", "full", "full", "f… $ CPU_hours 938.4, 266.3, 2080.1, 2536.2, 227.… $ n_genomes 71, 34, 100, 74, 26, 77, 271, 32, … $ median_proteins 3317.0, 3997.0, 2936.5, 5728.5, 43… $ min_proteins 2472, 3168, 2050, 3198, 3250, 3271… $ max_proteins 4489, 5266, 5532, 9922, 6671, 5991…pocpbenchmark_genome_metadata.parquetRows: 4,767 Columns: 17 $ accession "GB_GCA_000024525.1", "GB_GCA_… $ gtdb_representative TRUE, TRUE, TRUE, TRUE, TRUE, … $ gtdb_taxonomy "d__Bacteria;p__Bacteroidota;c… $ Domain "d__Bacteria", "d__Bacteria", … $ Phylum "p__Bacteroidota", "p__Pseudom… $ Class "c__Bacteroidia", "c__Alphapro… $ Order "o__Cytophagales", "o__Rhodoba… $ Family "f__Spirosomaceae", "f__Rhodob… $ Genus "g__Spirosoma", "g__Ruegeria",… $ Species "s__Spirosoma linguale", "s__R… $ family "f__Spirosomaceae", "f__Rhodob… $ num_seqs 7129, 3495, 3414, 3208, 2149, … $ sum_len 2497095, 1057624, 649075, 9969… $ min_len 30, 30, 20, 21, 30, 30, 20, 22… $ avg_len 350.3, 302.6, 190.1, 310.8, 28… $ max_len 5545, 1511, 1221, 3965, 7961, … $ genome_size 8491258, 3523710, 2298088, 341…pocpbenchmark_pocp_values.parquetRows: 2,358,466 Columns: 16 $ type "POCP", "POCPu", "POCP", "PO… $ tool BLAST_BLASTP, BLAST_BLASTP, … $ is_recommended_tool FALSE, FALSE, FALSE, FALSE, … $ pocp 80.77531, 61.63676, 80.77531… $ query "RS_GCF_000021325.1", "RS_GC… $ subject "RS_GCF_000182745.2", "RS_GC… $ query_genus "g__Gluconacetobacter", "g__… $ query_gtdb_taxonomy "d__Bacteria;p__Pseudomonado… $ subject_genus "g__Komagataeibacter", "g__K… $ subject_gtdb_taxonomy "d__Bacteria;p__Pseudomonado… $ same_genus_truth FALSE, FALSE, FALSE, FALSE, … $ same_genus TRUE, TRUE, TRUE, TRUE, TRUE… $ class "FP", "FP", "FP", "FP", "FP"… $ same_genus_random FALSE, TRUE, TRUE, TRUE, FAL… $ class_random "TN", "FP", "FP", "FP", "TN"… $ Family "f__Acetobacteraceae", "f__A…
These table files for the POCP benchmark manuscript (preprint: 10.1101/2025.03.17.643616) contains POCP and POCPu values, along with genome-level and family-level metadata, plus computing metrics calculated for each nextflow processes. These very large tables are in the Parquet format and a glimpse of the content of the table is provided below. How to read parquet files Parquet files can be read: imported within R using https://nanoparquet.r-lib.org/reference/read_parquet.html glimpsed in a web browser: https://parquetreader.com https://www.parquet-viewer.com/
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
