Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2021 English Publisher:Elsevier BVJournal:Future Generation Computer Systems, volume 115, pages 188-200 (issn: 0167-739X,

Copyright policy )

Authors: Saeed Mirpour Marzuni; Abdorreza Savadi; Adel Nadjaran Toosi; Mahmoud Naghibzadeh;

doi: 10.1016/j.future.2020.09.009

Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce

- Summary
- Metrics

Abstract

Abstract The MapReduce model is widely used to store and process big data in a distributed manner. MapReduce was originally developed for a single tightly coupled cluster of computers. Approaches such as Hierarchical and Geo-Hadoop are designed to address geo-distributed MapReduce processing. However, these methods still suffer from high inter-cluster data transfer over the Internet, which is prohibitive for processing today’s globally big data. In line with our thinking that there is no need to transfer the entire intermediate results to a single global reducer, we propose Cross-MapReduce, a framework for geo-distributed MapReduce processing. Before any massive data transfer, our proposed method finds a set of best global reducers to minimize transferred data volumes. We propose a graph called Global Reduction Graph (GRG) to determine the number and the locations of the global reducers. We conducted extensive experimental evaluations using a real testbed to demonstrate the effectiveness of Cross-MapReduce. The experimental results show that Cross-MapReduce significantly outperforms the Hierarchical and Geo-Hadoop approaches and reduces the amount of data transfer over the Internet by 40%.

Related Organizations

Monash University, Clayton campus
Australia
Ferdowsi University of Mashhad
Iran (Islamic Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

13

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now