A Formal Semantics for Data Analytics Pipelines

Research, Preprint OPEN
Drocco, Maurizio; Misale, Claudia; Tremblay, Guy; Aldinucci, Marco;
(2017)
  • Related identifiers: doi: 10.5281/zenodo.571802
  • Subject: Types | Computer Science - Programming Languages | Big Data analytics | D.2.4 | D.1.3 | D.3.2 | Parallel computing | Distributed computing

In this report, we present a new programming model based on Pipelines and Operators, which are the building blocks of programs written in PiCo, a DSL for Data Analytics Pipelines. In the model we propose, we use the term Pipeline to denote a workflow that processes data... View more
  • References (8)

    [1] T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Ferna`ndezMoctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow., 8(12):1792-1803, Aug. 2015.

    [2] Flink. Apache Flink website. https://flink.apache.org/.

    [3] Flink. Flink streaming examples, 2015. [Online; accessed 16-November2016].

    [7] E. A. Lee and T. M. Parks. Dataflow process networks. Proc. of the IEEE, 83(5):773-801, 1995.

    [8] C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay. A comparison of big data frameworks on a layered dataflow model. In Proc. of HLPP2016: Intl. Workshop on High-Level Parallel Programming, pages 1-19, Muenster, Germany, July 2016. arXiv.org.

    [9] C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay. A comparison of big data frameworks on a layered dataflow model. Parallel Processing Letters, 27(01):1740003, 2017.

    [10] M. A. U. Nasir, G. D. F. Morales, D. Garc´ıa-Soriano, N. Kourtellis, and M. Serafini. The power of both choices: Practical load balancing for distributed stream processing engines. CoRR, abs/1504.00788, 2015.

    [11] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Faulttolerant Abstraction for In-memory Cluster Computing. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, Berkeley, CA, USA, 2012. USENIX.

  • Metrics
    No metrics available
Share - Bookmark