An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object , Article 01 Jan 2003Publisher:Springer Berlin Heidelberg

Authors: ORLANDO, Salvatore; P. PALMERINI; R. PEREGO; F. SILVESTRI;

doi: 10.1007/3-540-36569-9_28

handle: 20.500.14243/114016 , 10278/35012 , 11573/1572776

An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.

Related Organizations

National Research Council
Sri Lanka
Sapienza University of Rome
Italy
National Research Council
Italy
Institute of Information Science and Technologies "A. Faedo"
Italy
University of Pisa
Italy

View all View all

Keywords

ParDCI, Data Mining

1 Research products, page 1 of 1

Adaptive and resource-aware mining of frequent sets
2003IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average