Approximated Summarization of Data Provenance

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 17 Oct 2015 France Publisher:ACMJournal:Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementFunded by:EC | MODAS

Authors: Eleanor, Ainy; Bourhis, Pierre; Davidson, Susan; Deutch, Daniel; Milo, Tova;

doi: 10.1145/2806416.2806429

Approximated Summarization of Data Provenance

- Summary
- Subjects
- Metrics

Abstract

Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand how the resulting information was derived. Data provenance has proven helpful in this respect, however, maintaining and presenting the full and exact provenance information may be infeasible due to its size and complexity. We therefore introduce the notion of approximated summarized provenance, which provides a compact representation of the provenance at the possible cost of information loss. Based on this notion, we present a novel provenance summarization algorithm which, based on the semantics of the underlying data and the intended use of provenance, outputs a summary of the input provenance. Experiments measure the conciseness and accuracy of the resulting provenance summaries, and improvement in provenance usage time.

Country

France

Related Organizations

French Institute for Research in Computer Science and Automation
France
Centre national de la recherche scientifique
France
University of Lille
France
Centre de Recherche en Informatique
France
Tel Aviv University
Israel

View all View all

Keywords

[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB]

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	28
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%