Downloads provided by UsageCounts
Abstract Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of “stratifications,” which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.
Standards, Medical Sciences, Science, Humans; Genome, Human; Software; Genomics/methods; Sequence Analysis, DNA/methods; Benchmarking; High-Throughput Nucleotide Sequencing/methods, Article, Biomedical Informatics, 576, Genomic analysis, Databases, Medical Specialties, Medicine and Health Sciences, and Immunity, Humans, Biological Phenomena, Genome, Genome, Human, Cell Phenomena, Q, Life Sciences, High-Throughput Nucleotide Sequencing, Genetics and Genomics, DNA, Genomics, Sequence Analysis, DNA, Benchmarking, Medical Molecular Biology, Sequence Analysis, Medical Genetics, Software, Human
Standards, Medical Sciences, Science, Humans; Genome, Human; Software; Genomics/methods; Sequence Analysis, DNA/methods; Benchmarking; High-Throughput Nucleotide Sequencing/methods, Article, Biomedical Informatics, 576, Genomic analysis, Databases, Medical Specialties, Medicine and Health Sciences, and Immunity, Humans, Biological Phenomena, Genome, Genome, Human, Cell Phenomena, Q, Life Sciences, High-Throughput Nucleotide Sequencing, Genetics and Genomics, DNA, Genomics, Sequence Analysis, DNA, Benchmarking, Medical Molecular Biology, Sequence Analysis, Medical Genetics, Software, Human
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 47 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
| views | 3 | |
| downloads | 1 |

Views provided by UsageCounts
Downloads provided by UsageCounts