
Theta join is one of the essential and important types of queries in database systems. As the amount of data needs to be processed increases, processing theta joins with a single machine becomes impractical. Therefore, theta join algorithms using distributed computing frameworks have been studied widely. Although one of the state-of-the-art theta-join algorithms uses M-Bucket-I heuristic, it is hard to use since running time of M-Bucket-I heuristic, which computes a mapping from a record to a reducer (i.e., reducer mapping), is O(n) where n is the size of input data. In this paper, we propose MBI-I algorithm which reduces the running time of M-Bucket-I heuristic to and gives the same result as M-Bucket-I heuristic does. We also conducted several experiments to show algorithm and confirmed that our algorithm can improve the performance of a theta join by 10%.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
