
handle: 10214/21884
Training a classifier requires a supply of example problems and the correct classification (label) for each. In some practical situations examples are plentiful, but obtaining labels for them is costly. Several algorithms exist for learning classification when only a small number of examples are "labelled" at the outset and the remainder are "unlabelled." This thesis presents continued work on the Guelph Cluster Class algorithm developed by Dara, Stacey and Kremer. Specifically, it investigates how the algorithm performs on ten real-world data sets over a range of parameter settings, and whether cluster validity indices can guide the setting of the parameters. An examination of a simple clustering problem points to explanations for the algorithm's behaviour, and tests of a variant algorithm that capitalizes on these observations are presented. Finally, this thesis explores whether clustering information can guide the selection of examples which, if labelled, would be especially informative for classifier training.
classification, classifier training, Guelph Cluster Class algorithm, clustering
classification, classifier training, Guelph Cluster Class algorithm, clustering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
