SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Dec 2020Embargo end date: 01 Oct 2019 Switzerland English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 36, pages 5,282-5,290 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Mohammed Alser; Taha Shahroodi; Juan Gómez-Luna; Can Alkan; Onur Mutlu;

doi: 10.1093/bioinformatics/btaa1015 , 10.48550/arxiv.1910.09020 , 10.3929/ethz-b-000462178

pmid: 33315064

arXiv: 1910.09020

handle: 20.500.11850/462178

SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs and FPGAs. Results SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers’s bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a configurable scoring function), by up to 37.7× and 43.9× (>12× on average), respectively, with its CPU implementation, and by up to 413× and 689× (>400× on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979× (276.9× on average) and 91.7× (31.7× on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g. configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities. Availabilityand implementation https://github.com/CMU-SAFARI/SneakySnake. Supplementary information Supplementary data are available at Bioinformatics online.

Country

Switzerland

Related Organizations

Carnegie Mellon University
United States
Bilkent University
Turkey
Information Technology and Electrical Engineering
Switzerland
ETH Zurich
Switzerland
EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
Switzerland

Keywords

Genomics (q-bio.GN), FOS: Computer and information sciences, Computer Science - Distributed, Parallel, and Cluster Computing, FOS: Biological sciences, Computer Science - Data Structures and Algorithms, Hardware Architecture (cs.AR), Quantitative Biology - Genomics, Data Structures and Algorithms (cs.DS), Distributed, Parallel, and Cluster Computing (cs.DC), Computer Science - Hardware Architecture

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	37
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

37

Top 10%

Green

gold

Fields of Science (4) View all

engineering and technology

medical engineering

Fields of Science

engineering and technology

medical engineering

View all