
arXiv: 2212.10452
The discovery of utility-driven patterns is a valuable and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the utility measure is often used to reflect the importance, profit, or risk of an object or pattern. In the database, while utility is a flexible criterion for patterns, it is also a somewhat limited criterion due to the overlook of utility sharing. This leads to the derived patterns only exploring partial and local knowledge in the database. Utility occupancy considers the problem of mining with high utility but low occupancy. However, existing studies are focused on itemsets that cannot reveal the temporal relationship of object occurrences. Therefore, this article first defines the concept of utility occupancy of sequence data and raises the problem of High-Utility Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing pattern-related information, including Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table), are designed, and six upper bounds are proposed to improve efficiency. Extensive experiments are conducted to evaluate the efficiency and effectiveness of the novel algorithm. A specific case study is provided, and the effects of different upper bounds and pruning strategies are analyzed. The comprehensive results suggest that the HUOSPM task is useful and efficient.
FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Databases, Computer Science - Artificial Intelligence, Databases (cs.DB)
FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Databases, Computer Science - Artificial Intelligence, Databases (cs.DB)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
