publication . Preprint . 2016

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George; Edwards, H. Carter; Olivier, Stephen L.;
Open Access English
  • Published: 21 Jan 2016
Abstract
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platform...
Subjects
free text keywords: Computer Science - Mathematical Software, 68W10
Download from
40 references, page 1 of 3

[1] K. Agrawal, C.E. Leiserson, and J. Sukha. Executing task graphs using work-stealing. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1-12, April 2010.

[2] J I Aliaga, R M Bad´ıa, M Barreda, M Bollho¨fer, and E S Quintana-Ort´ı. Leveraging Task-Parallelism with OmpSs in ILUPACK's Preconditioned CG Method. In 26th IEEE Intl. Symp. on Computer Architecture and High Performance Computing, SBAC-PAD '14, pages 262-269. IEEE, 2014.

[3] Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. The Tera computer system. ACM SIGARCH Computer Architecture News, 18(3b):1-6, 1990.

[4] Ce´dric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-Andren'feg Wacrenier. STARPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency Computat. Pract. Exper., 23(2):187-198, 2011.

[5] Eric Bavier, Mark Hoemmen, Sivasankaran Rajamanickam, and Heidi Thornquist. Amesos2 and belos: Direct and iterative solvers for large sparse linear systems. Sci. Program., 20(3):241-255, July 2012.

[6] Erik G Boman, Karen D Devine, and Sivasankaran Rajamanickam. Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In Intl. Conf. on High Performance Computing, Networking, Storage and Analysis, page 50. ACM, 2013.

[7] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, and Jack Dongarra. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. 2001 IEEE Intl. Parallel and Distributed Processing Symp. Workshops, pages 1432-1441, 2011. [OpenAIRE]

[8] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, and Jack J. Dongarra. PaRSEC: Exploiting heterogeneity to enhance scalability. Computing in Science and Engineering, 15(6):36-45, 2013.

[9] Franc¸ois Broquedis, Je´roˆ me Clet-Ortega, Ste´phanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. hwloc: A generic framework for managing hardware affinities in HPC applications. In 18th Euromicro Conf. on Parallel, Distributed and Network-Based Processing, pages 180-186, 2010.

[10] Aydin Buluc and John R Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. SIAM J. Sci. Comput., 34(4):C170-C191, 2012.

[11] Alfredo Buttari. Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. SIAM J. Sci. Comput., 35(4):323-345, 2013. [OpenAIRE]

[12] Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. Parallel Computing, 35(1):38-53, 2009.

[13] V. Cave´, J. Zhao, J. Shirako, and V. Sarkar. Habanero-java: The new adventures of old X10. In Proc. 9th Intl. Conf. on Principles and Practice of Programming in Java, PPPJ '11, pages 51-61. ACM, 2011.

[14] B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl., 21(3):291-312, August 2007.

[15] Ernie Chan, Field G. van Zee, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, and Robert A. van de Geijn. Satisfying your dependencies with Supermatrix. In 2007 Intl. Conf. on Cluster Computing, pages 91-99. IEEE, 2007.

40 references, page 1 of 3
Abstract
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platform...
Subjects
free text keywords: Computer Science - Mathematical Software, 68W10
Download from
40 references, page 1 of 3

[1] K. Agrawal, C.E. Leiserson, and J. Sukha. Executing task graphs using work-stealing. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1-12, April 2010.

[2] J I Aliaga, R M Bad´ıa, M Barreda, M Bollho¨fer, and E S Quintana-Ort´ı. Leveraging Task-Parallelism with OmpSs in ILUPACK's Preconditioned CG Method. In 26th IEEE Intl. Symp. on Computer Architecture and High Performance Computing, SBAC-PAD '14, pages 262-269. IEEE, 2014.

[3] Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. The Tera computer system. ACM SIGARCH Computer Architecture News, 18(3b):1-6, 1990.

[4] Ce´dric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-Andren'feg Wacrenier. STARPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency Computat. Pract. Exper., 23(2):187-198, 2011.

[5] Eric Bavier, Mark Hoemmen, Sivasankaran Rajamanickam, and Heidi Thornquist. Amesos2 and belos: Direct and iterative solvers for large sparse linear systems. Sci. Program., 20(3):241-255, July 2012.

[6] Erik G Boman, Karen D Devine, and Sivasankaran Rajamanickam. Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In Intl. Conf. on High Performance Computing, Networking, Storage and Analysis, page 50. ACM, 2013.

[7] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, and Jack Dongarra. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. 2001 IEEE Intl. Parallel and Distributed Processing Symp. Workshops, pages 1432-1441, 2011. [OpenAIRE]

[8] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, and Jack J. Dongarra. PaRSEC: Exploiting heterogeneity to enhance scalability. Computing in Science and Engineering, 15(6):36-45, 2013.

[9] Franc¸ois Broquedis, Je´roˆ me Clet-Ortega, Ste´phanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. hwloc: A generic framework for managing hardware affinities in HPC applications. In 18th Euromicro Conf. on Parallel, Distributed and Network-Based Processing, pages 180-186, 2010.

[10] Aydin Buluc and John R Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. SIAM J. Sci. Comput., 34(4):C170-C191, 2012.

[11] Alfredo Buttari. Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. SIAM J. Sci. Comput., 35(4):323-345, 2013. [OpenAIRE]

[12] Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. Parallel Computing, 35(1):38-53, 2009.

[13] V. Cave´, J. Zhao, J. Shirako, and V. Sarkar. Habanero-java: The new adventures of old X10. In Proc. 9th Intl. Conf. on Principles and Practice of Programming in Java, PPPJ '11, pages 51-61. ACM, 2011.

[14] B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl., 21(3):291-312, August 2007.

[15] Ernie Chan, Field G. van Zee, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, and Robert A. van de Geijn. Satisfying your dependencies with Supermatrix. In 2007 Intl. Conf. on Cluster Computing, pages 91-99. IEEE, 2007.

40 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue