Visualized mixed-type data analysis via dimensionality reduction

descriptionPublicationkeyboard_double_arrow_right Article 26 Sep 2018Publisher:SAGE PublicationsJournal:Intelligent Data Analysis, volume 22, pages 981-1,007 (issn: 1088-467X, eissn: 1571-4128,

Copyright policy )

Authors: Chung-Chian Hsu; Jhen-Wei Wu;

doi: 10.3233/ida-173480

Visualized mixed-type data analysis via dimensionality reduction

- Summary
- Metrics

Abstract

Visualization is a useful technique in data analysis, especially, in the initial stage, data exploration. Since high-dimensional data is not visible, dimensionality reduction techniques are usually used to reduce the data to a lower dimension, say two, for visualization. In previous studies, dimensionality reduction was investigated in the context of numeric datasets. Nevertheless, most of real-world datasets are of mixed-type containing both numeric and categorical attributes. In this case, a traditional approach could neither handle it directly nor output appropriate results. To address this problem, we propose a procedure for visualized analysis of mixed-type data via dimensionality reduction. Dissimilarity between categorical values is learned from the dataset and further used to measure the distance between mixed-type data points. In addition, we propose an approach to identifying significant features and visualizing patterns from the projection map chosen according to quality measures. Experiments on real-world datasets were conducted to demonstrate feasibility of the proposed method.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average