PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 03 Oct 2023 English Publisher:MDPI AGJournal:Mathematics, volume 11, page 4,154 (eissn: 2227-7390,

Copyright policy )

Authors: Emilio Castillo-Ibarra; Marco A. Alsina; Cesar A. Astudillo; Ignacio Fuenzalida-Henríquez;

doi: 10.3390/math11194154

PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

- Summary
- Subjects
- Metrics

Abstract

Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

Related Organizations

University of Talca
Chile
San Sebastián University
Chile

Keywords

Nipals, missing data, QA1-939, interpretability, unsupervised feature selection, Mathematics, clustering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering