Computational Pan-Genomics: Status, Promises and Challenges

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 12 Mar 2016Embargo end date: 07 Sep 2019 Germany, France, France, France, France, Italy, Italy, France, France, France, Turkey, France, Netherlands, Italy, Italy Publisher:openRxivJournal:Briefings in Bioinformatics (issn: 1467-5463, eissn: 1477-4054,

Copyright policy )Funded by:NWO | Statistical Models for St..., ANR | IBC, NIH | GENCODE: comprehensive ge... +6 projects

Authors: Marschall, Tobias; Marz, Manja; Abeel, Thomas; Dijkstra, Louis; Dutilh, Bas E.; Ghaffaari, Ali; Kersey, Paul; +52 Authors

Computational Pan-Genomics: Status, Promises and Challenges

- Summary
- Subjects
- Metrics

Abstract

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens , the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic datasets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics , a new sub-area of research in computational biology. In this paper, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies, and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

Countries

Germany, France, France, France, France, Italy, Italy, France, France, France, Turkey, France, Netherlands, Italy, Italy

Related Organizations

Karlsruhe Institute of Technology
Germany
Leiden University
Netherlands
French Institute for Research in Computer Science and Automation
France
Freie Universität Berlin
Germany
Universität Augsburg
Germany

View all View all

Keywords

ddc:004, Cancer Research, haplotypes, Data structures, EMC NIHES-01-64-02, read mapping, Medizin, CMBI - Radboud University Medical Center, EMC MM-04-20-01, pan-genome; sequence graph; read mapping; haplotypes; data structures, Pan-genome, data structures, data structures; haplotypes; pan-genome; read mapping; sequence graph, Humans, Data structures; Haplotypes; Pan-genome; Read mapping; Sequence graph; Computational Biology; Genomics; Humans; Algorithms; Genome, Human; Software; Information Systems; Molecular Biology, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], sequence graph, Genome, Human, Computational Biology, Genomics, 004, Radboudumc 14: Tumours of the digestive tract RIMLS: Radboud Institute for Molecular Life Sciences, Haplotypes, Papers, Read mapping, Sequence graph, pan-genome, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Algorithms, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	144
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%