High-Utility Itemset Mining with Effective Pruning Strategies

descriptionPublicationkeyboard_double_arrow_right Article 11 Nov 2019 English Publisher:Association for Computing Machinery (ACM)Journal:ACM Transactions on Knowledge Discovery from Data, volume 13, pages 1-22 (issn: 1556-4681, eissn: 1556-472X,

Copyright policy )

Authors: Jimmy Ming-Tai Wu; Jerry Chun-Wei Lin; Ashish Tamrakar;

doi: 10.1145/3363571

High-Utility Itemset Mining with Effective Pruning Strategies

- Summary
- Metrics

Abstract

High-utility itemset mining is a popular data mining problem that considers utility factors, such as quantity and unit profit of items besides frequency measure from the transactional database. It helps to find the most valuable and profitable products/items that are difficult to track by using only the frequent itemsets. An item might have a high-profit value which is rare in the transactional database and has a tremendous importance. While there are many existing algorithms to find high-utility itemsets (HUIs) that generate comparatively large candidate sets, our main focus is on significantly reducing the computation time with the introduction of new pruning strategies. The designed pruning strategies help to reduce the visitation of unnecessary nodes in the search space, which reduces the time required by the algorithm. In this article, two new stricter upper bounds are designed to reduce the computation time by refraining from visiting unnecessary nodes of an itemset. Thus, the search space of the potential HUIs can be greatly reduced, and the mining procedure of the execution time can be improved. The proposed strategies can also significantly minimize the transaction database generated on each node. Experimental results showed that the designed algorithm with two pruning strategies outperform the state-of-the-art algorithms for mining the required HUIs in terms of runtime and number of revised candidates. The memory usage of the designed algorithm also outperforms the state-of-the-art approach. Moreover, a multi-thread concept is also discussed to further handle the problem of big datasets.

Related Organizations

Shandong University of Science and Technology
China (People's Republic of)
University of Nevada, Las Vegas
United States
Western Norway University of Applied Sciences
Norway
Bergen University College
Norway

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	95
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%