
Abstract Background As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. Results In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: , , , and . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. Conclusion This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.
Genome, Pangenomics, QH301-705.5, Research, Sequence analysis, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], Sequence Analysis, DNA, Genomics, QH426-470, Variation graphs, Genetics, Humans, Biology (General), de Bruijn graphs, Algorithms, Software, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
Genome, Pangenomics, QH301-705.5, Research, Sequence analysis, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], Sequence Analysis, DNA, Genomics, QH426-470, Variation graphs, Genetics, Humans, Biology (General), de Bruijn graphs, Algorithms, Software, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 31 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
