Parallel Frequent Item Set Mining with Selective Item Replication

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Oct 2011Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Parallel and Distributed Systems, volume 22, pages 1,632-1,640 (issn: 1045-9219,

Copyright policy )Funded by:TUBITAK | Bellek Sınırlaması Altınd...

Authors: Ozkural, Eray; Aykanat, Cevdet; Uçar, Bora;

doi: 10.1109/tpds.2011.32

handle: 11693/12129 , 11693/21884

Parallel Frequent Item Set Mining with Selective Item Replication

- Summary
- Subjects
- Metrics

Abstract

We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases.

Related Organizations

French Institute for Research in Computer Science and Automation
France
Claude Bernard University Lyon 1
France
Ecole Normale Supérieure de Lyon
France
Bilkent University
Turkey
Laboratoire de l'Informatique du Parallélisme
France

View all View all

Keywords

Frequent item set mining, Data replication, 511, Graph partitioning by vertex separato, Selective Data Replication, Mining Methods And Algorithms, Graph Partitioning By Vertex Separator, Mining methods and algorithms, Data mining, Selective data replication, graph partitioning by vertex separator, Parallel Data Mining, 006, selective data replication, mining methods and algorithms, Frequent Item Set Mining, Graph theory, Graph Partitioning, frequent item set mining, Separators, Frequent Itemsets, Algorithms, Parallel data mining

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	20
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%