
handle: 11375/28188
Clustering, also known as unsupervised classification, is a foundational machine learning technique and is used to find underlying group structures in data. There are many well-established model-based techniques to analyze either categorical or continuous data in the clustering paradigm. However, there is a relative paucity of work for mixed-type data, especially mixed data where the continuous variables exhibit skewness and heavy tails. In this thesis, different methodologies and models are presented for analyzing asymmetric and mixed-typed data. The first method is a mixture model for analyzing asymmetric mixed-type data. The second is modelling contaminated mixed-type data and identifying potential outliers. Lastly, model averaging techniques are developed for skewed-data based on Occam’s window and parsimonious mixture models. The expectation-maximization algorithm is used here to estimate the model parameters. Both real and simulated data are used for illustration.
Doctor of Science (PhD)
Thesis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
