publication . Article . 2019

A Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems

Cruz, Eduardo; Diener, Matthias; Lima Pilla, Laércio; Navaux, Philippe;
Open Access English
  • Published: 08 Mar 2019
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; Communication between tasks and load imbalance have been identified as a major challenge for the performance and energy efficiency of parallel applications. A common way to improve communication is to increase its locality, that is, to reduce the distances of data transfers, prioritizing the usage of faster and more efficient local interconnections over remote ones. Regarding load imbalance, cores should execute a similar amount of work. An important problem to be solved in this context is how to determine an optimized mapping of tasks to cluster nodes and cores that increases the overall locality and load balancing. In this paper, we pro...
Subjects
free text keywords: Task mapping, Grid computing, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Greedy algorithm, Parallel computing, Efficient energy use, Hierarchy, Algorithm, computer.software_genre, computer, Load balancing (computing), Computer science, Cluster (physics), Locality
43 references, page 1 of 3

[1] A. Anbar, O. Serres, E. Kayraklioglu, A.-H. A. Badawy, and T. El-Ghazawi. Exploiting Hierarchical Locality in Deep Parallel Architectures. ACM Transactions on Architecture and Code Optimization (TACO), 13(2):1{25, 2016.

[2] R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. ACM SIGOPS Operating Systems Review, 43(2):56{65, apr 2009.

[3] D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):66{73, 1991.

[4] S. Bak, H. Menon, S. White, M. Diener, and L. Kale. Multi-level load balancing with an integrated runtime approach. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2018.

[5] G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23(May):1{155, 2014.

[6] N. Barrow-Williams, C. Fensch, and S. Moore. A Communication Characterisation of Splash-2 and Parsec. In IEEE International Symposium on Workload Characterization (IISWC), pages 86{97, 2009.

[7] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72{81, 2008.

[8] J. E. Boillat and P. G. Kropf. A Fast Distributed Mapping Algorithm. In Joint International Conference on Vector and Parallel Processing (CONPAR 90 { VAPP IV), pages 405{416, 1990.

[9] S. Bokhari. On the Mapping Problem. IEEE Transactions on Computers, C-30(3):207{214, 1981.

[10] B. Brandfass, T. Alrutz, and T. Gerhold. Rank reordering for MPI communication optimization. Computers & Fluids, 80(July):372{380, 2013.

[11] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A Generic Framework for Managing Hardware A nities in HPC Applications. In Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP), pages 180{186, 2010. [OpenAIRE]

[13] C. Chevalier and F. Pellegrini. PT-Scotch: A Tool for E cient Parallel Graph Ordering. Parallel Computing, 34(6-8):318{331, July 2008. [OpenAIRE]

[14] E. H. M. Cruz, M. Diener, M. A. Z. Alves, and P. O. A. Navaux. Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols. Journal of Parallel and Distributed Computing (JPDC), 74(3):2215{2228, mar 2014.

[15] E. H. M. Cruz, M. Diener, L. L. Pilla, and P. O. A. Navaux. An E - cient Algorithm for Communication-Based Task Mapping. In International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pages 207{214, 2015.

[16] E. H. M. Cruz, M. Diener, L. L. Pilla, and P. O. A. Navaux. Hardwareassisted thread and data mapping in hierarchical multicore architectures. ACM Trans. Archit. Code Optim., 13(3):28:1{28:28, Sept. 2016.

43 references, page 1 of 3
Abstract
International audience; Communication between tasks and load imbalance have been identified as a major challenge for the performance and energy efficiency of parallel applications. A common way to improve communication is to increase its locality, that is, to reduce the distances of data transfers, prioritizing the usage of faster and more efficient local interconnections over remote ones. Regarding load imbalance, cores should execute a similar amount of work. An important problem to be solved in this context is how to determine an optimized mapping of tasks to cluster nodes and cores that increases the overall locality and load balancing. In this paper, we pro...
Subjects
free text keywords: Task mapping, Grid computing, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Greedy algorithm, Parallel computing, Efficient energy use, Hierarchy, Algorithm, computer.software_genre, computer, Load balancing (computing), Computer science, Cluster (physics), Locality
43 references, page 1 of 3

[1] A. Anbar, O. Serres, E. Kayraklioglu, A.-H. A. Badawy, and T. El-Ghazawi. Exploiting Hierarchical Locality in Deep Parallel Architectures. ACM Transactions on Architecture and Code Optimization (TACO), 13(2):1{25, 2016.

[2] R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. ACM SIGOPS Operating Systems Review, 43(2):56{65, apr 2009.

[3] D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):66{73, 1991.

[4] S. Bak, H. Menon, S. White, M. Diener, and L. Kale. Multi-level load balancing with an integrated runtime approach. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2018.

[5] G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23(May):1{155, 2014.

[6] N. Barrow-Williams, C. Fensch, and S. Moore. A Communication Characterisation of Splash-2 and Parsec. In IEEE International Symposium on Workload Characterization (IISWC), pages 86{97, 2009.

[7] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72{81, 2008.

[8] J. E. Boillat and P. G. Kropf. A Fast Distributed Mapping Algorithm. In Joint International Conference on Vector and Parallel Processing (CONPAR 90 { VAPP IV), pages 405{416, 1990.

[9] S. Bokhari. On the Mapping Problem. IEEE Transactions on Computers, C-30(3):207{214, 1981.

[10] B. Brandfass, T. Alrutz, and T. Gerhold. Rank reordering for MPI communication optimization. Computers & Fluids, 80(July):372{380, 2013.

[11] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A Generic Framework for Managing Hardware A nities in HPC Applications. In Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP), pages 180{186, 2010. [OpenAIRE]

[13] C. Chevalier and F. Pellegrini. PT-Scotch: A Tool for E cient Parallel Graph Ordering. Parallel Computing, 34(6-8):318{331, July 2008. [OpenAIRE]

[14] E. H. M. Cruz, M. Diener, M. A. Z. Alves, and P. O. A. Navaux. Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols. Journal of Parallel and Distributed Computing (JPDC), 74(3):2215{2228, mar 2014.

[15] E. H. M. Cruz, M. Diener, L. L. Pilla, and P. O. A. Navaux. An E - cient Algorithm for Communication-Based Task Mapping. In International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pages 207{214, 2015.

[16] E. H. M. Cruz, M. Diener, L. L. Pilla, and P. O. A. Navaux. Hardwareassisted thread and data mapping in hierarchical multicore architectures. ACM Trans. Archit. Code Optim., 13(3):28:1{28:28, Sept. 2016.

43 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Article . 2019

A Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems

Cruz, Eduardo; Diener, Matthias; Lima Pilla, Laércio; Navaux, Philippe;