Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2014Publisher:IEEEJournal:15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Authors: Ayaz ul Hassan Khan Khan; Mayez Al-Mouhamed; Allam Fatayer; Anas Almousa; Abdulrahman Baqais; Mohammed Assayony;

doi: 10.1109/snpd.2014.6888709 , 10.2991/ijndc.2014.2.3.2

Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

- Summary
- Subjects
- Metrics

Abstract

The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on GPU devices. The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T x T) of the problem space. However, to the best of our knowledge an extra space of Tx(T+1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.

Related Organizations

King Fahd University of Petroleum and Minerals
Saudi Arabia

Keywords

coalesced memory access, Electronic computers. Computer science, linear Algebra solvers, GPU, CUDA, QA75.5-76.95, Bank conflict free, matrix transpose

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average