Learning-based Dynamic Pinning of Parallelized Applications in Many-Core Systems

Conference object, Preprint OPEN
Chasparis, Georgios ; Janjic, Vladimir ; Rossbory, Michael ; Hammond, Kevin (2018)

This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to many-core systems. Under the proposed setup each thread of the parallelized application constitutes an independent decision maker, which autonomously decides on which processing unit to run next. Decisions are updated recursively for each thread by a resource manager which runs in parallel to the application’s threads and periodically records their performances and assigns to them new CPU affinities. We extend prior work of the authors by introducing a two-level decision making process that is more appropriate to handle many-core systems under Non-Uniform Memory Access architectures (NUMA). In particular, the first level may handle pinning of threads or memory over the available NUMA nodes, while the second level may handle pinning over the available CPU cores of the selected NUMA nodes. Under such framework, a learning process updates current estimates and decisions separately for each one of the two decision levels. Additionally, a novel performance-based learning dynamics is introduced which is more appropriate to handle measurement noise and rapid variations in the performance of the threads. Experiments are performed in a many-core Linux platform.
  • References (15)
    15 references, page 1 of 2

    [1] Angelis, F.D., Boaro, M., Fuselli, D., Squartini, S., Piazza, F., Wei, Q.: Optimal home energy management under dynamic electrical and thermal constraints. IEEE Transactions on Industrial Informatics 9(3), 1518-1527 (Aug 2013)

    [2] Bini, E., Buttazzo, G.C., Eker, J., Schorr, S., Guerra, R., Fohler, G., Årzén, K.E., Vanessa, R., Scordino, C.: Resource management on multicore systems: The ACTORS approach. IEEE Micro 31(3), 72-81 (2011)

    [3] Brecht, T.: On the importance of parallel application placement in NUMA multiprocessors. In: Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV). pp. 1-18. San Deigo, CA (Jul 1993)

    [4] Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.A., Namyst, R.: ForestGOMP: An efficient OpenMP environment for NUMA architectures. International Journal Parallel Programming 38, 418-439 (2010)

    [5] Chasparis, G.C., Maggio, M., Bini, E., Årzén, K.E.: Design and implementation of distributed resource management for time-sensitive applications. Automatica 64, 44-53 (2016)

    [6] Chasparis, G.C., Rossbory, M.: Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning. International Journal of Parallel Programming pp. 1-15 (Nov 2017), https://link.springer. com/article/10.1007/s10766-017-0541-y

    [7] Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Company, Scituate, MA, USA (2004)

    [8] Inaltekin, H., Wicker, S.: A one-shot random access game for wireless networks. In: International Conference on Wireless Networks, Communications and Mobile Computing (2005)

    [9] Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: autopin - automated optimization of thread-to-core pinning on multicore systems. In: Stenstrom, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers III, Lecture Notes in Computer Science, vol. 6590, pp. 219-235. Springer Berlin Heidelberg (2011)

    [10] Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference. pp. 7-10 (1999)

  • Metrics
    No metrics available
Share - Bookmark