Scaling up Copy Detection

Preprint English OPEN
Li, Xian; Dong, Xin Luna; Lyons, Kenneth B.; Meng, Weiyi; Srivastava, Divesh;
  • Subject: Computer Science - Databases

Recent research shows that copying is prevalent for Deep-Web data and considering copying can significantly improve truth finding from conflicting values. However, existing copy detection techniques do not scale for large sizes and numbers of data sources, so truth find... View more
  • References (12)
    12 references, page 1 of 2

    [1] A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918-929, 2006.

    [2] L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In CAiSE, 2010.

    [3] S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In Sigmod, 1995.

    [4] N. Dalvi, A. Machanavajjhala, and B. Pang. An analysis of structured data on the web. PVLDB, 5:680-691, 2012.

    [5] X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 2010.

    [6] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009.

    [7] X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009.

    [8] X. L. Dong and D. Srivastava. Large-scale copying detection. In Sigmod (Tutorial), 2011.

    [9] X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun and W. Zhang. From data fusion to knowledge fusion PVLDB, 7(10), 2014.

    [10] R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.

  • Metrics
Share - Bookmark