
The majority of data available in most disciplines is unlabeled and unclassified. The amount of data is often massive, hence scalable processing methods are required. One method of providing structure to unlabeled data is to group it by clustering. Density based methods discover the number of clusters. Additionally, the shape of such clusters can also be irregular. In this paper we examine a version of DBSCAN modified to use fuzzy membership functions (FN-DBSCAN). FN-DBSCAN was implemented using the WEKA data mining framework and a scalable technique (SFN-DBSCAN) is simulated using the framework. Experimental results show that SFN-DBSCAN can be over three times as fast as FN-DBSCAN for small to medium size data. The resulting cluster assignments match at an average rate of 90% when compared with assignments by FN-DBSCAN. SFN-DBSCAN's speed increases proportionally with respect to the number of subsets, but cluster assignment concurrence between FN-DBSCAN and SFN-DBSCAN suffers from degradation as the number of subsets increase.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
