FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

Name: FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets
Keywords: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Databases, Databases (cs.DB)

Akshit Sharma; Sam Reinehr; Dinesh Mehta; Bo Wu

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1145/371189...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 03 Aug 2025Embargo end date: 01 Jan 2024Publisher:ACMJournal:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2

Authors: Akshit Sharma; Sam Reinehr; Dinesh Mehta; Bo Wu;

doi: 10.1145/3711896.3736966 , 10.48550/arxiv.2404.01585

arXiv: 2404.01585

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

- Summary
- Subjects
- Metrics

Abstract

Frequent Subgraph Mining (FSM) is the process of identifying common subgraph patterns that surpass a predefined frequency threshold. While FSM is widely applicable in fields like bioinformatics, chemical analysis, and social network anomaly detection, its execution remains time-consuming and complex. This complexity stems from the need to recognize high-frequency subgraphs and ascertain if they exceed the set threshold. Current approaches to identifying these patterns often rely on edge or vertex extension methods. However, these strategies can introduce redundancies and cause increased latency. To address these challenges, this paper introduces a novel approach for identifying potential k-vertex patterns by combining two frequently observed (k - 1)-vertex patterns. This method optimizes the breadth-]first search, which allows for quicker search termination based on vertices count and support value. Another challenge in FSM is the validation of the presumed pattern against a specific threshold. Existing metrics, such as Maximum Independent Set (MIS) and Minimum Node Image (MNI), either demand significant computational time or risk overestimating pattern counts. Our innovative approach aligns with the MIS and identifies independent subgraphs. Through the "Maximal Independent Set" metric, this paper offers an efficient solution that minimizes latency and provides users with control over pattern overlap. Through extensive experimentation, our proposed method achieves an average of 10.58x speedup when compared to GraMi and an average 3x speedup when compared to T-FSM

Related Organizations

Colorado School of Mines
United States

Keywords

Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Databases, Databases (cs.DB)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green