
arXiv: 1307.3598
Traditionally, there are three species of classification: unsupervised, supervised, and semi-supervised. Supervised and semi-supervised classification differ by whether or not weight is given to unlabelled observations in the classification procedure. In unsupervised classification, or clustering, all observations are unlabeled and hence full weight is given to unlabelled observations. When some observations are unlabelled, it can be very difficult to \textit{a~priori} choose the optimal level of supervision, and the consequences of a sub-optimal choice can be non-trivial. A flexible fractionally-supervised approach to classification is introduced, where any level of supervision --- ranging from unsupervised to supervised --- can be attained. Our approach uses a weighted likelihood, wherein weights control the relative role that labelled and unlabelled data have in building a classifier. A comparison between our approach and the traditional species is presented using simulated and real data. Gaussian mixture models are used as a vehicle to illustrate our fractionally-supervised classification approach; however, it is broadly applicable and variations on the postulated model can be easily made.
FOS: Computer and information sciences, weighted likelihood, Classification and discrimination; cluster analysis (statistical aspects), model-based clustering, Machine Learning (stat.ML), discriminant analysis, Statistics - Applications, Statistics - Computation, Methodology (stat.ME), Statistics - Machine Learning, Applications (stat.AP), model-based classification, fractionally-supervised classification, Statistics - Methodology, Computation (stat.CO), finite mixture models
FOS: Computer and information sciences, weighted likelihood, Classification and discrimination; cluster analysis (statistical aspects), model-based clustering, Machine Learning (stat.ML), discriminant analysis, Statistics - Applications, Statistics - Computation, Methodology (stat.ME), Statistics - Machine Learning, Applications (stat.AP), model-based classification, fractionally-supervised classification, Statistics - Methodology, Computation (stat.CO), finite mixture models
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 20 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
