DDS: integrating data analytics transformations in task-based workflows [version 2; peer review: 1 approved, 2 approved with reservations]

Name: DDS: integrating data analytics transformations in task-based workflows [version 2; peer review: 1 approved, 2 approved with reservations]
Keywords: eng, Task Based Programming Models, H, Parallel Computing, Big Data High Performance, Science, Q, Data Analytics, Social Sciences

Rosa M. Badia; Javier Alvarez; Jorge Ejarque; Nihad Mammadli

Found an issue? Give us feedback

https://doaj.org/art...arrow_drop_down

https://doaj.org/article/828c0...

Article . 2023 . Peer-reviewed

Data sources: DOAJ

DDS: integrating data analytics transformations in task-based workflows [version 2; peer review: 1 approved, 2 approved with reservations]

descriptionPublicationkeyboard_double_arrow_right Article 01 Apr 2023 English Publisher:F1000 Research LtdJournal:Open Research Europe (issn: 2732-5121,

Copyright policy )

Authors: Rosa M. Badia; Javier Alvarez; Jorge Ejarque; Nihad Mammadli;

DDS: integrating data analytics transformations in task-based workflows [version 2; peer review: 1 approved, 2 approved with reservations]

- Summary
- Subjects
- Metrics

Abstract

High-performance data analytics (HPDA) is a current trend in e-science research that aims to integrate traditional HPC with recent data analytic frameworks. Most of the work done in this field has focused on improving data analytic frameworks by implementing their engines on top of HPC technologies such as Message Passing Interface. However, there is a lack of integration from an application development perspective. HPC workflows have their own parallel programming models, while data analytic (DA) algorithms are mainly implemented using data transformations and executed with frameworks like Spark. Task-based programming models (TBPMs) are a very efficient approach for implementing HPC workflows. Data analytic transformations can also be decomposed as a set of tasks and implemented with a task-based programming model. In this paper, we present a methodology to develop HPDA applications on top of TBPMs that allow developers to combine HPC workflows and data analytic transformations seamlessly. A prototype of this approach has been implemented on top of the PyCOMPSs task-based programming model to validate two aspects: HPDA applications can be seamlessly developed and have better performance than Spark. We compare our results using different programs. Finally, we conclude with the idea of integrating DA into HPC applications and evaluation of our method against Spark.

Keywords

eng, Task Based Programming Models, H, Parallel Computing, Big Data High Performance, Science, Q, Data Analytics, Social Sciences

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average