Leveraging transitive relations for crowdsourced joins

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 22 Jun 2013Embargo end date: 01 Jan 2014Publisher:ACMJournal:Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataFunded by:NSF | Making Sense at Scale wit..., FCT | LA 6

Authors: Jiannan Wang 0001; Guoliang Li 0001; Tim Kraska; Michael J. Franklin; Jianhua Feng;

doi: 10.1145/2463676.2465280 , 10.48550/arxiv.1408.6916

arXiv: 1408.6916

Leveraging transitive relations for crowdsourced joins

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. As a human-only solution is expensive, we adopt a hybrid human-machine approach which first uses machines to generate a candidate set of matching pairs, and then asks humans to label the pairs in the candidate set as either matching or non-matching. Given the candidate pairs, existing approaches will publish all pairs for verification to a crowdsourcing platform. However, they neglect the fact that the pairs satisfy transitive relations. As an example, if $o_1$ matches with $o_2$, and $o_2$ matches with $o_3$, then we can deduce that $o_1$ matches with $o_3$ without needing to crowdsource $(o_1, o_3)$. To this end, we study how to leverage transitive relations for crowdsourced joins. We propose a hybrid transitive-relations and crowdsourcing labeling framework which aims to crowdsource the minimum number of pairs to label all the candidate pairs. We prove the optimal labeling order in an ideal setting and propose a heuristic labeling order in practice. We devise a parallel labeling algorithm to efficiently crowdsource the pairs following the order. We evaluate our approaches in both simulated environment and a real crowdsourcing platform. Experimental results show that our approaches with transitive relations can save much more money and time than existing methods, with a little loss in the result quality.

Related Organizations

Tsinghua University
Brown University
United States
University of California, Berkeley
United States
Brown University
Department of Computer Science
Spain

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Databases, Databases (cs.DB)

8 Research products, page 1 of 1

Interior superconvergence error estimates for mixed finite element methods for second order elliptic problem
1999IsAmongTopNSimilarDocuments
Recovering of curves with involution by extended Prym data
1994IsAmongTopNSimilarDocuments
On the renormalization of operator products: the scalar gluonic case
2016IsAmongTopNSimilarDocuments
Potential energy curves of the quasi-stable states of CO2+ determined using Auger spectroscopy
2007IsAmongTopNSimilarDocuments
Properly immersed minimal disks bounded by straight lines
2000IsAmongTopNSimilarDocuments
Interpolation theorem for a continuous function on orientations of a simple graph
1998IsAmongTopNSimilarDocuments
Multivariate Central Limit Theorem in Quantum Dynamics
2013IsAmongTopNSimilarDocuments
Characterizing Heavy Subgraph Pairs for Pancyclicity
2014IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	140
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

140

Top 1%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Funded by

NSF| Making Sense at Scale with Algorithms, Machines, and People, FCT| LA 6