Downloads provided by UsageCounts
handle: 10261/364100
Modern computers are typically heterogeneous devices—besides the standard central processing unit (CPU), they commonly include an accelerator such as a graphics processing unit (GPU). However, exploiting the full potential of such computers is challenging, especially when complex workloads consisting of multiple computationally demanding tasks are to be processed. This paper proposes a framework called Umpalumpa, which aims to manage complex workloads on heterogeneous computers. Umpalumpa combines three aspects that ease programming and optimize code performance. Firstly, it implements a data-centric design, where data are described by their physical properties (e. g., location in memory, size) and logical properties (e. g., dimensionality, shape, padding). Secondly, Umpalumpa utilizes task-based parallelism to schedule tasks on heterogeneous nodes. Thirdly, tasks can be dynamically autotuned on a source code level according to the hardware where the task is executed and the processed data. Altogether, Umpalumpa allows for implementing a complex workload, which is automatically executed on CPUs and accelerators, and allows autotuning to maximize the performance with the given hardware and data input. Umpalumpa focuses on image processing workloads, but the concept is generic and can be extended to different types of workloads. We demonstrate the usability of the proposed framework on two previously accelerated applications from cryogenic electron microscopy: 3D Fourier reconstruction and Movie alignment. We show that, compared to the original implementations, Umpalumpa reduces the complexity and improves the maintainability of the main applications’ loops while improving performance through automatic memory management and autotuning of the GPU kernels.
Computational resources were supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures. The project that gave rise to these results received the support of a fellow- ship from the “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/DI18/11660021. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 713673. The authors acknowledge the economic support from MCIN to: the Instruct Image Processing Center (I2PC) as part of the Span- ish participation in Instruct-ERIC, the European Strategic Infrastructure Project (ESFRI) in the area of Structural Biology. Grant PID2019-104757RB-I00 funded by MCIN/AEI/ 10.13039/501100011033/ and “ERDF A way of making Europe”, by the “European Union”. “Comunidad Autónoma de Madrid” through Grant: S2017/BMD-3817.
Peer reviewed
Image processing, Auto-tuning, Task-based systems, Data-aware architecture, CUDA
Image processing, Auto-tuning, Task-based systems, Data-aware architecture, CUDA
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 27 | |
| downloads | 20 |

Views provided by UsageCounts
Downloads provided by UsageCounts