
doi: 10.1109/tkde.2013.93
In crowdsourcing database, human operators are embedded into the database engine and collaborate with other conventional database operators to process the queries. Each human operator publishes small HITs (Human Intelligent Task) to the crowdsourcing platform, which consist of a set of database records and corresponding questions for human workers. The published records in HITs may contain sensitive attributes, probably causing privacy leakage so that malicious workers could link them with other public databases to reveal individual private information. Conventional privacy protection techniques, such as K-Anonymity, can be applied to partially solve the problem. In this paper, we first study the tradeoff between the privacy and accuracy for the human operator within data anonymization process. A probability model is proposed to estimate the lower bound and upper bound of the accuracy for general K-Anonymity approaches. We show that searching the optimal anonymity approach is NP-Hard and only heuristic approach is available. The second contribution of the paper is a general feedback-based K-Anonymity scheme. In our scheme, synthetic samples are published to the human workers, the results of which are used to guide the selection on anonymity strategies.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 25 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
