Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 28 Nov 2022 English Publisher:MDPI AGJournal:Electronics, volume 11, page 3,929 (eissn: 2079-9292,

Copyright policy )

Authors: Faten Khalid Karim; Hela Elmannai; Abdelrahman Seleem; Safwat Hamad; Samih M. Mostafa;

doi: 10.3390/electronics11233929

Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures

- Summary
- Subjects
- Metrics

Abstract

Handling missing values (MVs) and feature selection (FS) are vital preprocessing tasks for many pattern recognition, data mining, and machine learning (ML) applications, involving classification and regression problems. The existence of MVs in data badly affects making decisions. Hence, MVs have to be taken into consideration during preprocessing tasks as a critical problem. To this end, the authors proposed a new algorithm for manipulating MVs using FS. Bayesian ridge regression (BRR) is the most beneficial type of Bayesian regression. BRR estimates a probabilistic model of the regression problem. The proposed algorithm is dubbed as cumulative Bayesian ridge with similarity and Luca’s fuzzy entropy measure (CBRSL). CBRSL reveals how the fuzzy entropy FS used for selecting the candidate feature holding MVs aids in the prediction of the MVs within the selected feature using the Bayesian Ridge technique. CBRSL can be utilized to manipulate MVs within other features in a cumulative order; the filled features are incorporated within the BRR equation in order to predict the MVs for the next selected incomplete feature. An experimental analysis was conducted on four datasets holding MVs generated from three missingness mechanisms to compare CBRSL with state-of-the-art practical imputation methods. The performance was measured in terms of R2 score (determination coefficient), RMSE (root mean square error), and MAE (mean absolute error). Experimental results indicate that the accuracy and execution times differ depending on the amount of MVs, the dataset’s size, and the mechanism type of missingness. In addition, the results show that CBRSL can manipulate MVs generated from any missingness mechanism with a competitive accuracy against the compared methods.

Related Organizations

Ain Shams University, Faculty of Computer and Information Science
Egypt
Princess Nourah bint Abdulrahman University
Saudi Arabia
South Valley University
Egypt
Princess Nourah bint Abdulrahman University
Saudi Arabia
Ain Shams University
Egypt

Keywords

missingness mechanisms; feature selection; bayesian ridge regression; imputation; similarity classifier

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

gold

Fields of Science (4) View all

Fields of Science