Downloads provided by UsageCounts
This paper presents MapReduce as a distributed data processing model utilizing open source Hadoop framework for work huge volume of data. The expansive volume of data in the advanced world, especially multimedia data, makes new requirement for processing and storage. As an open source distributed computational framework, Hadoop takes into consideration processing a lot of images on an unbounded arrangement of computing nodes by giving fundamental foundations. We have lots and lots of small images files and need to remove duplicate files from the available data. As most binary formats—particularly those that are compressed or encrypted—cannot be split and must be read as a single linear stream of data. Using such files as input to a MapReduce job means that a single mapper will be used to process the entire file, causing a potentially large performance hit. The paper proposes splitable format such as SequenceFile and uses MD5 algorithm to improve the performance of image processing.
MapReduce, distributed data processing, Hadoop, sequence file
MapReduce, distributed data processing, Hadoop, sequence file
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 2 | |
| downloads | 5 |

Views provided by UsageCounts
Downloads provided by UsageCounts