Distributed data management using MapReduce

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2014 Singapore English Publisher:Association for Computing Machinery (ACM)Journal:ACM Computing Surveys, volume 46, pages 1-42 (issn: 0360-0300, eissn: 1557-7341,

Copyright policy )

Authors: Feng Li; Beng Chin Ooi; M. Tamer Özsu; Sai Wu;

doi: 10.1145/2503009

Distributed data management using MapReduce

- Summary
- Subjects
- Metrics

Abstract

MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.

Country

Singapore

Related Organizations

Zhejiang Ocean University
China (People's Republic of)
National University of Singapore
Singapore
University of Waterloo
Canada
National University of Singapore Libraries
Singapore

Keywords

Hadoop, Scalability, MapReduce

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	79
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%