publication . Preprint . Conference object . 2009

GPU sample sort

Nikolaj Leischner; Vitaly Osipov; Peter Sanders;
Open Access English
  • Published: 30 Sep 2009
Abstract
In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implemen...
Subjects
ACM Computing Classification System: Data_FILES
free text keywords: Computer Science - Data Structures and Algorithms, Computer Science - Distributed, Parallel, and Cluster Computing, Insertion sort, Sorting algorithm, Counting sort, Adaptive sort, Computer science, Parallel computing, Selection sort, Comparison sort, Bucket sort, Quicksort

[1] Opencl. http://www.khronos.org/opencl/.

[2] K. E. Batcher. Sorting networks and their applications. In Proc. AFIPS Spring Joint Computing Conference, volume 32 of AFIPS Conference Proceedings, pages 307-314, 1968. [OpenAIRE]

[3] D. Cederman and P. Tsigas. A practical quicksort algorithm for graphics processors. In Proc. European Symposium on Algorithms (ESA), volume 5193 of LNCS, pages 246-258, 2008. [OpenAIRE]

[4] S. Chen, J. Qin, Y. Xie, J. Zhao, and P.-A. Heng. A fast and flexible sorting algorithm with cuda. In ICA3PP, volume 5574 of LNCS, pages 281-290, 2009.

[5] J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313- 1324, 2008.

[6] N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 325-336, 2006. [OpenAIRE]

[7] D. R. Helman, D. A. Bader, and J. Ja´Ja´. A randomized parallel sorting algorithm with an experimental study. J. of Parallel and Distributed Computing, 52(1):1-23, 1998.

[8] E. Lindholm, J. Nickolls, S. F. Oberman, and J. Montrym. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39-55, 2008.

[9] M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation, 8(1):3- 30, 1998.

[10] D. R. Musser. Introspective sorting and selection algorithms. Software: Practice and Experience, 27(8):983-993, 1997.

[11] P. Sanders and S. Winkel. Super scalar sample sort. In Proc. European Symposium on Algorithms (ESA), volume 3221 of LNCS, pages 784-796. Springer, 2004.

[12] N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. In Proc. Int'l Symposium on Parallel and Distributed Processing (IPDPS), 2009.

[13] S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In Proc. ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 97-106, 2007.

[14] J. Singler, P. Sanders, and F. Putze. Mcstl: The multi-core standard template library. In Proc. Int'l Conference on Parallel Processing (Euro-Par), volume 4641 of LNCS, pages 682-694, 2007. [OpenAIRE]

[15] E. Sintorn and U. Assarsson. Fast parallel gpu-sorting using a hybrid algorithm. J. of Parallel and Distributed Computing, 68(10):1381-1388, 2008. [OpenAIRE]

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue