Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao IRIS UNIMORE - Archi...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://doi.org/10.1109/hpcs.2...
Article . 2017 . Peer-reviewed
Data sources: Crossref
DBLP
Conference object . 2025
Data sources: DBLP
versions View all 3 versions
addClaim

Cleaning MapReduce Workflows

Authors: Matteo Interlandi; Julien Lacroix; Omar Boucelma; Francesco Guerra 0001;

Cleaning MapReduce Workflows

Abstract

Integrity constraints (ICs) such as Functional Dependencies (FDs) or Inclusion Dependencies (INDs) are commonly used in databases to check if input relations obey to certain pre-defined quality metrics. While Data-Intensive Scalable Computing (DISC) platforms such as MapReduce commonly accept as input (semi-structured) data not in relational format, still data is often transformed in key/value pairs when data is required to be re-partitioned; a process commonly referred to as shuffle. In this work, we present a Provenance-Aware model for assessing the quality of shuffled data: more precisely, we capture and model provenance using the PROV-DM W3C recommendation and we extend it with rules expressed a la Datalog to assess data quality dimensions by means of ICs metrics over DISC systems. In this way, data (and algorithmic) errors can be promptly and automatically detected without having to go through a lengthy process of output debugging.

Country
Italy
Keywords

Computer Science Applications1707 Computer Vision and Pattern Recognition; Information Systems and Management; Modeling and Simulation; Computer Networks and Communications; Computer Science (miscellaneous)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 135
  • 135
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
135
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!