Sparse-Matrix support for the SkePU library for portable CPU/GPU programming
- Publisher: Linköpings universitet, Institutionen för datavetenskap
Sparse Matrix | SkePU | auto tuning | CPU/GPU programming | conversion function | profile guided composition | Computer Sciences | Datavetenskap (datalogi)
In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the number of zeroes is relatively large then a requirement for more efficient data structure arises. Beyond the sparse matrix representation, we propose an algorithm to judge the condition where computation on sparse matrices is more beneficial in terms of execution time for an ongoing computation and to adapt a matrix's state accordingly, which is the main concern of this thesis work. We present and implement an approach to switch automatically between two data container types dynamically inside the SkePU framework for a multi-core GPU-based heterogeneous system. The new sparse matrix data container supports all SkePU skeletons and nearly all SkePU operations. We provide compression and decompression algorithms from dense matrix to sparse matrix and vice versa on CPU and GPUs using SkePU data parallel skeletons. We have also implemented a context aware switching mechanism in order to switch between two data container types on the CPU or the GPU. A multi-state matrix representation, and selection on demand is also made possible. In order to evaluate and test effectiveness and efficiency of our extension to the SkePU framework, we have considered Matrix-Vector Multiplication as our benchmark program because iterative solvers like Conjugate Gradient and Generalized Minimum Residual use Sparse Matrix-Vector Multiplication as their basic operation. Through our benchmark program we have demonstrated adaptive switching between two data container types, implementation selection between CUDA and OpenMP, and converting the data structure depending on the density of non-zeroes in a matrix. Our experiments on GPU-based architectures show that our automatic switching mechanism adapts with the fastest SkePU implementation variant, and has a limited training cost.