Using a Greedy Algorithm for the Improvement of a MapReduce, Theta join, M-Bucket-I Heuristic

그리디 알고리즘을 이용한 맵리듀스 세타조인 M-Bucket-I 휴리스틱의 개선

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 15 Feb 2016 English Publisher:Korean Institute of Information Scientists and EngineersJournal:Journal of KIISE, volume 43, pages 229-236 (issn: 2383-630X,

Authors: Wooyeol Kim; Kyuseok Shim;

doi: 10.5626/jok.2016.43.2.229

Using a Greedy Algorithm for the Improvement of a MapReduce, Theta join, M-Bucket-I Heuristic

- Summary
- Metrics

Abstract

Theta join is one of the essential and important types of queries in database systems. As the amount of data needs to be processed increases, processing theta joins with a single machine becomes impractical. Therefore, theta join algorithms using distributed computing frameworks have been studied widely. Although one of the state-of-the-art theta-join algorithms uses M-Bucket-I heuristic, it is hard to use since running time of M-Bucket-I heuristic, which computes a mapping from a record to a reducer (i.e., reducer mapping), is O(n) where n is the size of input data. In this paper, we propose MBI-I algorithm which reduces the running time of M-Bucket-I heuristic to and gives the same result as M-Bucket-I heuristic does. We also conducted several experiments to show algorithm and confirmed that our algorithm can improve the performance of a theta join by 10%.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average