
doi: 10.1145/3363571
High-utility itemset mining is a popular data mining problem that considers utility factors, such as quantity and unit profit of items besides frequency measure from the transactional database. It helps to find the most valuable and profitable products/items that are difficult to track by using only the frequent itemsets. An item might have a high-profit value which is rare in the transactional database and has a tremendous importance. While there are many existing algorithms to find high-utility itemsets (HUIs) that generate comparatively large candidate sets, our main focus is on significantly reducing the computation time with the introduction of new pruning strategies. The designed pruning strategies help to reduce the visitation of unnecessary nodes in the search space, which reduces the time required by the algorithm. In this article, two new stricter upper bounds are designed to reduce the computation time by refraining from visiting unnecessary nodes of an itemset. Thus, the search space of the potential HUIs can be greatly reduced, and the mining procedure of the execution time can be improved. The proposed strategies can also significantly minimize the transaction database generated on each node. Experimental results showed that the designed algorithm with two pruning strategies outperform the state-of-the-art algorithms for mining the required HUIs in terms of runtime and number of revised candidates. The memory usage of the designed algorithm also outperforms the state-of-the-art approach. Moreover, a multi-thread concept is also discussed to further handle the problem of big datasets.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 92 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
