Scalable fuzzy neighborhood DBSCAN

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jul 2010Publisher:IEEEJournal:International Conference on Fuzzy Systems

Authors: Jonathon K. Parker; Lawrence O. Hall; Abraham Kandel;

doi: 10.1109/fuzzy.2010.5584527

Scalable fuzzy neighborhood DBSCAN

- Summary
- Metrics

Abstract

The majority of data available in most disciplines is unlabeled and unclassified. The amount of data is often massive, hence scalable processing methods are required. One method of providing structure to unlabeled data is to group it by clustering. Density based methods discover the number of clusters. Additionally, the shape of such clusters can also be irregular. In this paper we examine a version of DBSCAN modified to use fuzzy membership functions (FN-DBSCAN). FN-DBSCAN was implemented using the WEKA data mining framework and a scalable technique (SFN-DBSCAN) is simulated using the framework. Experimental results show that SFN-DBSCAN can be over three times as fast as FN-DBSCAN for small to medium size data. The resulting cluster assignments match at an average rate of 90% when compared with assignments by FN-DBSCAN. SFN-DBSCAN's speed increases proportionally with respect to the number of subsets, but cluster assignment concurrence between FN-DBSCAN and SFN-DBSCAN suffers from degradation as the number of subsets increase.

Related Organizations

Florida Southern College
United States
University of South Florida
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average