The Power of Nested Parallelism in Big Data Processing  Hitting Three Flies with One Slap 

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 09 Jun 2021Publisher:ACMJournal:Proceedings of the 2021 International Conference on Management of Data

Authors: Gábor E. Gévay; Jorge-Arnulfo Quiané-Ruiz; Volker Markl;

doi: 10.1145/3448016.3457287

The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap –

- Summary
- Related research
  (5)
- Metrics

Abstract

Many common data analysis tasks, such as performing hyperparameter optimization, processing a partitioned graph, and treating a matrix as a vector of vectors, offer natural opportunities for nested-parallel operations, i.e., launching parallel operations from inside other parallel operations. However, state-of-the-art dataflow engines, such as Spark and Flink, do not support nested parallelism. Users must implement workarounds, causing orders of magnitude slowdowns for their tasks, let alone the implementation effort. We present Matryoshka, a system that enables dataflow engines to support nested parallelism, even in the presence of control flow statements at inner nesting levels. Matryoshka achieves this via a novel two-phase flattening process, which translates nested-parallel programs to flat-parallel programs that can efficiently run on existing dataflow engines. The first phase introduces novel nesting primitives into the code, which allows for dynamic optimizations based on intermediate data characteristics in the second phase at runtime. We validate our system using several common data analysis tasks, such as PageRank and K-means.

Related Organizations

Technical University of Berlin
Germany
Technical University of Berlin
Germany

5 Research products, page 1 of 1

NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs
2014IsAmongTopNSimilarDocuments
Heartbeat scheduling: provable efficiency for nested parallelism
2018IsAmongTopNSimilarDocuments
The Data Locality of Work Stealing
2000IsAmongTopNSimilarDocuments
Scheduling irregular parallel computations on hierarchical caches
2011IsAmongTopNSimilarDocuments
Disentanglement in nested-parallel programs
2019IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%