PARTIAL KEY GROUPING: Load-Balanced Partitioning of Distributed Streams
Nasir, Muhammad Anis Uddin; Morales, Gianmarco De Francisci; Garcia-Soriano, David; Kourtellis, Nicolas; Serafini, Marco;
Publisher: Qatar Computing Research Institute
Subject: Computer Science - Distributed, Parallel, and Cluster Computing | Load balancing | stream processing | Computer Systems | Datorsystem | stream grouping | power of both choices
We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce PARTIAL KEY GROUPING (PKG), a new stream partitioning scheme that adapts the classical “power of two choices” to a distributed str... View more
 Y. Ben-Haim and E. Tom-Tov, “A Streaming Parallel Decision Tree Algorithm,” JMLR, vol. 11, pp. 849-872, 2010.
 R. Berinde, P. Indyk, G. Cormode, and M. J. Strauss, “Space-optimal heavy hitters with strong error bounds,” ACM Trans. Database Syst., vol. 35, no. 4, pp. 1-28, 2010.
 G. H. Gonnet, “Expected length of the longest probe sequence in hash code searching,” J. ACM, vol. 28, no. 2, pp. 289-304, 1981.
 M. Mitzenmacher, “The power of two choices in randomized load balancing,” IEEE Trans. Parallel Distrib. Syst., vol. 12, no. 10, pp. 1094-1104, 2001.
 M. A. Uddin Nasir, G. De Francisci Morales, D. Garcia-Soriano, N. Kourtellis, and M. Serafini, “The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines,” in ICDE, 2015.
 M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin, “Flux: An adaptive partitioning operator for continuous query systems,” in ICDE, 2003, pp. 25-36.
 M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. B. Zdonik, “Scalable distributed stream processing,” in CIDR, vol. 3, 2003, pp. 257-268.
 Y. Xing, S. Zdonik, and J.-H. Hwang, “Dynamic load distribution in the borealis stream processor,” in ICDE, 2005, pp. 791-802.
 B. Gedik, “Partitioning functions for stateful data parallelism in stream processing,” The VLDB Journal, pp. 1-23, 2013.
 C. Balkesen, N. Tatbul, and M. T. O¨zsu, “Adaptive input admission and management for parallel stream processing,” in DEBS. ACM, 2013, pp. 15-26.