The GIAB genomic stratifications resource for human reference genomes

descriptionPublicationkeyboard_double_arrow_right Article 29 Oct 2023 United States English Publisher:Springer Science and Business Media LLCJournal:Nature Communications, volume 15 (eissn: 2041-1723,

Copyright policy )

Authors: Nathan Dwarshuis; Divya Kalra; Jennifer McDaniel; Philippe Sanio; Pilar Alvarez Jerez; Bharati Jadhav; Wenyu Huang; +7 Authors

doi: 10.1038/s41467-024-53260-y , 10.1101/2023.10.27.563846 , 10.5281/zenodo.8414359 , 10.5281/zenodo.8414358 , 10.5281/zenodo.11176260

pmid: 39424793

pmc: PMC11489684

The GIAB genomic stratifications resource for human reference genomes

- Summary
- Subjects
- Metrics

Abstract

Abstract Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of “stratifications,” which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.

Country

United States

Related Organizations

Baylor College of Medicine
United States
National Institute of Standards and Technology
United States
National Institutes of Health
United States
University of Lausanne
Switzerland
Rice University
United States

View all View all

Keywords

Standards, Medical Sciences, Science, Humans; Genome, Human; Software; Genomics/methods; Sequence Analysis, DNA/methods; Benchmarking; High-Throughput Nucleotide Sequencing/methods, Article, Biomedical Informatics, 576, Genomic analysis, Databases, Medical Specialties, Medicine and Health Sciences, and Immunity, Humans, Biological Phenomena, Genome, Genome, Human, Cell Phenomena, Q, Life Sciences, High-Throughput Nucleotide Sequencing, Genetics and Genomics, DNA, Genomics, Sequence Analysis, DNA, Benchmarking, Medical Molecular Biology, Sequence Analysis, Medical Genetics, Software, Human

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	47
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%