Downloads provided by UsageCounts
This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.
Accepted in Future Generation Computer Systems (FGCS). Licensed under CC-BY-NC-ND
FOS: Computer and information sciences, Macrodades, Task-based workflows, Programming models, Parallel programming (Computer science), Programació en paral·lel (Informàtica), Streaming, Dataflows, Distributed computing, Big data, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Computer Science - Distributed, Parallel, and Cluster Computing, Electronic data processing -- Distributed processing, Convergence HPC-Big Data, Distributed, Parallel, and Cluster Computing (cs.DC), :Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Processament distribuït de dades
FOS: Computer and information sciences, Macrodades, Task-based workflows, Programming models, Parallel programming (Computer science), Programació en paral·lel (Informàtica), Streaming, Dataflows, Distributed computing, Big data, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Computer Science - Distributed, Parallel, and Cluster Computing, Electronic data processing -- Distributed processing, Convergence HPC-Big Data, Distributed, Parallel, and Cluster Computing (cs.DC), :Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Processament distribuït de dades
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 12 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
| views | 40 | |
| downloads | 173 |

Views provided by UsageCounts
Downloads provided by UsageCounts