Mining high occupancy itemsets

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2020 English Publisher:Elsevier BVJournal:Future Generation Computer Systems, volume 102, pages 222-229 (issn: 0167-739X,

Copyright policy )

Authors: Zhi-Hong Deng 0001;

doi: 10.1016/j.future.2019.07.039

Mining high occupancy itemsets

- Summary
- Metrics

Abstract

Abstract Frequent itemset mining has been extensively studied in data mining for over the last two decades because of its numerous applications. However, the classic support-based mining framework used by most previous studies is not suitable for some real-world applications, such as the travel landscapes recommendation, where o c c u p a n c y besides s u p p o r t also plays a key role in evaluating the interestingness of an itemset. In this paper, we propose a new kind of tasks based on o c c u p a n c y , namely high occupancy mining, by introducing o c c u p a n c y into the support-based mining framework. An efficient algorithm, HEP (abbreviation for High Efficient algorithm for mining high occupancy itemsets), is developed to discover all high occupancy itemsets. HEP use a structure, named occupancy-list, to store the occupancy information about an itemset and employs an iterative level-wise approach to mine high occupancy itemset via a pruning strategy based on upper bound of occupancy. Substantial experiments on both synthetic and real datasets show that HEP is efficient for mining high occupancy itemsets and is at least one order of magnitude faster than the baseline algorithm.

Related Organizations

Peking University
China (People's Republic of)
Peking University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	25
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

25

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now