publication . Conference object . 2018

A performance model to execute workflows on high-bandwidth-memory architectures

Benoit, Anne; Perarnau, Swann; Pottier, Loïc; Robert, Yves;
Open Access English
  • Published: 13 Aug 2018
  • Publisher: ACM
Abstract
International audience; This work presents a realistic performance model to execute scientific workflows on high-bandwidth-memory architectures such as the Intel Knights Landing. We provide a detailed analysis of the execution time on such platforms, taking into account transfers from both fast and slow memory and their overlap with computations. We discuss several scheduling and mapping strategies: not only tasks must be assigned to computing resources, but also one has to decide which fraction of input and output data will reside in fast memory and which will have to stay in slow memory. We use extensive simulations to assess the impact of the mapping strategi...
Subjects
free text keywords: Memory management, Parallel architectures, Scheduling, Performance model, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
25 references, page 1 of 2

[1] Massinissa Ait Aba, Lilia Zaourar, and Alix Munier. 2017. Approximation Algorithm for Scheduling a Chain of Tasks on Heterogeneous Systems. In European Conference on Parallel Processing. Springer, 353-365. [OpenAIRE]

[2] Ryo Asai. 2016. Clustering Modes in Knights Landing Processors: Developer's Guide. Technical Report. Colfax International.

[3] C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. 2010. Data-Aware Task Scheduling on Multi-Accelerator Based Platforms. In IEEE Int. Conf. on Parallel and Distributed Systems. 291-298. [OpenAIRE]

[4] Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187-198.

[5] Anne Benoit, Swann Perarnau, Loïc Pottier, and Yves Robert. 2018. A performance model to execute workflows on high-bandwidth memory architectures . Research report RR-9165. INRIA. Available at hal.inria.fr. [OpenAIRE]

[6] Kavitha Chandrasekar, Xiang Ni, and Laxmikant V. Kalé. 2017. A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications. In IEEE Int. Parallel and Distributed Processing Symposium Workshops, Orlando, FL, USA. 1293-1300.

[7] PEZY Computing. 2017. ZettaScaler-2.0 Configurable Liquid Immersion Cooling System. (2017). http://www.exascaler.co.jp/wp-content/uploads/2017/11/ zettascaler2.0_en_page.pdf

[8] Intel Corporation. 2018. Memkind: A User Extensible Heap Manager. https: //memkind.github.io. (2018).

[9] Jack Dongarra. 2016. Report on the Sunway TaihuLight system. Research report UT-EECS-16-742. Univ. Tennessee. Available at www.netlib.org.

[10] Erich Strohmaier et al. 2017. The TOP500 benchmark. (2017). https://www. top500.org/.

[11] M. R. Garey and D. S. Johnson. 1979. Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman and Company.

[12] T. Gautier, J. V. F. Lima, N. Maillard, and B. Rafin. 2013. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 1299-1308.

[13] Intel. 2017. Intel Xeon Phi Processor: Performance Monitoring Reference Manual - Volume 1: Registers. Technical Report. Intel.

[14] Los Alamos National Laboratory. 2017. Simplified Interface to Complex Memory. https://github.com/lanl/SICM. (2017).

[15] R. Landaverde, Tiansheng Zhang, A. K. Coskun, and M. Herbordt. 2014. An Investigation of Unified Memory Access Performance in CUDA. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1-6.

25 references, page 1 of 2
Related research
Abstract
International audience; This work presents a realistic performance model to execute scientific workflows on high-bandwidth-memory architectures such as the Intel Knights Landing. We provide a detailed analysis of the execution time on such platforms, taking into account transfers from both fast and slow memory and their overlap with computations. We discuss several scheduling and mapping strategies: not only tasks must be assigned to computing resources, but also one has to decide which fraction of input and output data will reside in fast memory and which will have to stay in slow memory. We use extensive simulations to assess the impact of the mapping strategi...
Subjects
free text keywords: Memory management, Parallel architectures, Scheduling, Performance model, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
25 references, page 1 of 2

[1] Massinissa Ait Aba, Lilia Zaourar, and Alix Munier. 2017. Approximation Algorithm for Scheduling a Chain of Tasks on Heterogeneous Systems. In European Conference on Parallel Processing. Springer, 353-365. [OpenAIRE]

[2] Ryo Asai. 2016. Clustering Modes in Knights Landing Processors: Developer's Guide. Technical Report. Colfax International.

[3] C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. 2010. Data-Aware Task Scheduling on Multi-Accelerator Based Platforms. In IEEE Int. Conf. on Parallel and Distributed Systems. 291-298. [OpenAIRE]

[4] Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187-198.

[5] Anne Benoit, Swann Perarnau, Loïc Pottier, and Yves Robert. 2018. A performance model to execute workflows on high-bandwidth memory architectures . Research report RR-9165. INRIA. Available at hal.inria.fr. [OpenAIRE]

[6] Kavitha Chandrasekar, Xiang Ni, and Laxmikant V. Kalé. 2017. A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications. In IEEE Int. Parallel and Distributed Processing Symposium Workshops, Orlando, FL, USA. 1293-1300.

[7] PEZY Computing. 2017. ZettaScaler-2.0 Configurable Liquid Immersion Cooling System. (2017). http://www.exascaler.co.jp/wp-content/uploads/2017/11/ zettascaler2.0_en_page.pdf

[8] Intel Corporation. 2018. Memkind: A User Extensible Heap Manager. https: //memkind.github.io. (2018).

[9] Jack Dongarra. 2016. Report on the Sunway TaihuLight system. Research report UT-EECS-16-742. Univ. Tennessee. Available at www.netlib.org.

[10] Erich Strohmaier et al. 2017. The TOP500 benchmark. (2017). https://www. top500.org/.

[11] M. R. Garey and D. S. Johnson. 1979. Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman and Company.

[12] T. Gautier, J. V. F. Lima, N. Maillard, and B. Rafin. 2013. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 1299-1308.

[13] Intel. 2017. Intel Xeon Phi Processor: Performance Monitoring Reference Manual - Volume 1: Registers. Technical Report. Intel.

[14] Los Alamos National Laboratory. 2017. Simplified Interface to Complex Memory. https://github.com/lanl/SICM. (2017).

[15] R. Landaverde, Tiansheng Zhang, A. K. Coskun, and M. Herbordt. 2014. An Investigation of Unified Memory Access Performance in CUDA. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1-6.

25 references, page 1 of 2
Related research
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Conference object . 2018

A performance model to execute workflows on high-bandwidth-memory architectures

Benoit, Anne; Perarnau, Swann; Pottier, Loïc; Robert, Yves;