Visualisation of heterogeneous data with simultaneous feature saliency using Generalised Generative Topographic Mapping

Part of book or chapter of book English OPEN
Mumtaz, Shahzad ; Randrianandrasana, Michel F. ; Bassi, Gurjinder ; Nabney, Ian T.
  • Publisher: Universität Bielefeld

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
  • References (18)
    18 references, page 1 of 2

    1. S. Alelyani, J. Tang, and H. Liu. Feature selection for clustering: A review. In Data Clustering: Algorithms and Applications, pages 29-60. Chapman and Hall/CRC, 2013.

    2. C. Bishop, M. Svense´n, and C. K. I. Williams. Magnification factors for the GTM algorithm. In In Proceedings IEE Fifth International Conference on Artificial Neural Networks, pages 64-69, 1997.

    3. C. M. Bishop and M. Svensen. GTM: The generative topographic mapping. Neural Compuatation, 10(1):215-234, 1998.

    4. N. Bouguila. On multivariate binary data clustering and feature weighting. Comput. Stat. Data Anal., 54(1):120-134, 2010.

    5. I. O. Caparroso. Variational Bayesian algorithms for generative topographic mapping and its extensions. PhD thesis, Universitat Polite`cnica de Catalunya, 2008.

    6. A. R. de Leon and K. C. Chough. Analysis of Mixed Data: Methods & Applications. Taylor & Francis Group. Chapman and Hall/CRC, 2013.

    7. A. Kaba´n and M. Girolami. A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8):859-872, 2001.

    8. M. H. C. Law, M. A. T. Figueiredo, and A. K. Jain. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1154-1166, 2004.

    9. D. M. Maniyar and I. T. Nabney. Data visualization with simultaneous feature selection. In Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on, pages 1-8, 2006.

    10. S. Mumtaz. Visualisation of bioinformatics datasets. PhD thesis, Aston University, 2015.

  • Metrics
    No metrics available
Share - Bookmark