
handle: 10446/194006 , 10807/203462
Clustering methods have typically found their application when dealing with continuous data. However, in many modern applications data consist of multiple categorical variables with no natural ordering. In the heuristic framework the problem of clustering these data is tackled by introducing suitable distances. In this work, we develop a model-based approach for clustering categorical data with nominal scale. Our approach is based on a mixture of distributions defined via the Hamming distance between categorical vectors. Maximum likelihood inference is delivered through an expectation-maximization algorithm. A simulation study is carried out to illustrate the proposed approach.
Le tecniche di clustering trovano normalmente la loro applicazione su variabili continue. Tuttavia, in molti contesti applicativi, i dati sono categorici senza un ordine naturale. All’interno del framework euristico, la clusterizzazione di questi dati avviene grazie all’utilizzo di metriche adeguate. In questo lavoro, proponiamo un approccio probabilistico per la clusterizzazione di dati categorici nominali. Il nostro approccio si basa su una mistura di distribuzioni derivate dal concetto di distanza di Hamming. Proponiamo l’utilzzo di un algoritmo EM per la stima di massima verosimiglianza dei parameteri del modello. L’approccio è validato su datasetsimulati.
Expectation-Maximization algorithm; Hamming distribution; mixture modeling; nominal data, Hamming distribution, nominal data, Expectation-Maximization algorithm, mixture modeling
Expectation-Maximization algorithm; Hamming distribution; mixture modeling; nominal data, Hamming distribution, nominal data, Expectation-Maximization algorithm, mixture modeling
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
