
This paper addresses the problem of query-aware data cleaning in the context of a user query. In particular, we develop a novel Query-Driven Approach ( ${\tt QDA}$ ) that systematically exploits the semantics of the predicates in ${\tt SQL}$ -like selection queries to reduce the data cleaning overhead. The objective of ${\tt QDA}$ is to issue the minimum number of cleaning steps that are necessary to answer a given ${\tt SQL}$ -like selection correctly. The comprehensive empirical evaluation of ${\tt QDA}$ demonstrates outstanding results—that is ${\tt QDA}$ is significantly better compared to traditional ER techniques, especially when the query is very selective.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
