Name: Hiding co-occurring frequent itemsets
Creator: Abul, Osman
Keywords: H.2.8 [database applications]: Data mining, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Conference object 22 Mar 2009Publisher:ACMJournal:Proceedings of the 2009 EDBT/ICDT Workshops

Authors: Abul, Osman;

doi: 10.1145/1698790.1698810

handle: 20.500.11851/5765

Hiding co-occurring frequent itemsets

- Summary
- Subjects
- Metrics

Abstract

Knowledge hiding, hiding rules/patterns that are inferable from published data and attributed sensitive, is extensively studied in the literature in the context of frequent itemsets and association rules mining from transactional data. The research in this thread is focused mainly on developing sophisticated methods that achieve less distortion in data quality. With this work, we extend frequent item-set hiding to co-occurring frequent itemset hiding problem. Co-occurring frequent itemsets are those itemsets that co-exist in the output of frequent itemset mining. What is different from the classical frequent hiding is the new sensitivity definition: an itemset set is sensitive if its itemsets appear altogether within the frequent item-set mining results. In other words, co-occurrence is defined with reference to the mining results but not to the raw input dataset, and thus it is a kind of meta-knowledge. Our notion of co-occurrence is also very different from association rules as itemsets in an association rule need to be frequently present in the same set of transactions, but the co-occurrence need not necessarily require the joint occurrence in the same set of transactions. In this paper, we briefly review the frequent itemset/association hiding problems and define the co-occurrence hiding along with the real world motivations. We explore its fundamental properties and show that frequent itemset hiding is a special case of the co-occurring frequent itemsets hiding. As a solution, we propose a two-stage sanitization framework, essentially a reduction, where an instance of the frequent itemset hiding is constructed in the first stage and the instance is solved in the second stage. Since the task is shown to be NP-Hard and the reduction is one-to-many, we propose heuristics only for the first stage as the second stage is a well-established field. Finally, an experimental evaluation is carried out on a couple of datasets, and the results are presented. Copyright 2009 ACM.

Related Organizations

Keywords

H.2.8 [database applications]: Data mining

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Average

Top 10%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering