
Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposeed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
