Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

Preprint English OPEN
Luckow, Andre; Chantzialexiou, George; Jha, Shantenu;
  • Subject: Computer Science - Distributed, Parallel, and Cluster Computing

An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications is a complex task and requires... View more
  • References (55)
    55 references, page 1 of 6

    [1] Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-ofOrder Data Processing. Proceedings of the VLDB Endowment 8 (2015), 1792-1803.

    [2] Amazon 2017. Amazon Kinesis. https://aws.amazon.com/kinesis/. (2017).

    [3] Amedeo Perazzo. 2016. LCLS Data Analysis Strategy. https://portal.slac.stanford. edu/sites/lcls_public/Documents/LCLSDataAnalysisStrategy.pdf

    [4] Apache 2018. Apache Beam. https://beam.apache.org/. (2018).

    [5] Apache Flink 2015. Apache Flink. https://flink.apache.org/. (2015).

    [6] Vivekanandan Balasubramanian, Iain Bethune, Ardita Shkurti, Elena Breitmoser, Eugen Hruska, Cecilia Clementi, Charles Laughton, and Shantenu Jha. 2016. ExTASY: Scalable and Flexible Coupling of MD Simulations and Advanced Sampling Techniques. In IEEE International Conference on eScience. http://arxiv.org/abs/1606.00093.

    [7] T. Bicer, D. Gursoy, R. Kettimuthu, I. T. Foster, B. Ren, V. De Andrede, and F. De Carlo. 2017. Real-Time Data Analysis and Autonomous Steering of Synchrotron Light Source Experiments. In 2017 IEEE 13th International Conference on e-Science (e-Science). 59-68. https://doi.org/10.1109/eScience.2017.53

    [8] Brookhaven National Laboratory. 2017. National Synchrotron Light Source II. https://www.bnl.gov/ps/

    [9] Nicholas Chaimov, Allen Malony, Shane Canon, Costin Iancu, Khaled Z. Ibrahim, and Jay Srinivasan. 2016. Scaling Spark on HPC Systems. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '16). ACM, New York, NY, USA, 97-110. https://doi.org/10. 1145/2907294.2907310

    [10] Dask Development Team. 2016. Dask: Library for dynamic task scheduling. http: //dask.pydata.org

  • Related Research Results (2)
  • Related Organizations (4)
  • Metrics