Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Expert Systemsarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Expert Systems
Article . 2025 . Peer-reviewed
License: Wiley Online Library User Agreement
Data sources: Crossref
DBLP
Article . 2025
Data sources: DBLP
versions View all 2 versions
addClaim

Single and Ensemble Based Filters in Environmental Data

Authors: Yousra Cherif; Ali Idri;

Single and Ensemble Based Filters in Environmental Data

Abstract

ABSTRACTResearchers rely on species distribution models (SDMs) to establish a correlation between species occurrence records and environmental data. These models offer insights into the ecological and evolutionary aspects of the subject. Feature selection (FS) aims to choose useful interlinked features or remove unnecessary and redundant ones and make the induced model easier to understand. Although feature selection plays a crucial role in SDMs, only a limited number of studies in the literature have addressed it with several key shortcomings such as lack of the use of multivariate techniques, lack of comparison between the univariate and the multivariate filters, and absence of a comparison between the ensemble univariate and multivariate filters. Therefore, this study presents a rigorous empirical evaluation consisting of assessing and comparing six filter‐based univariate feature selection methods using two thresholds with two multivariate techniques, as well as four classifiers: Extreme Gradient boosting (XGB), Random Forest (RF), Decision Tree (DT), and Light gradient‐boosting machine (LGBM). Furthermore, the current study proposes a novel approach for ensemble construction consisting of evaluating the applications of ensemble learning using 40% of features ranked by means of Borda Count and Reciprocal Rank (univariate filter ensembles) as well as the fusion‐based and the intersection‐based ensembles (multivariate filter ensembles). Moreover, we evaluated and compared the performances of univariate and multivariate techniques with their ensembles. Similarly, we evaluated and compared the performances of the best ensemble techniques across datasets. The empirical evaluations involve several techniques, such as the 5‐fold cross‐validation method, the Scott Knott (SK) test, and Borda Count. In addition, we used three performance metrics (accuracy, Kappa, and F1‐score). Experiments showed that Consistency‐based subset selection in conjunction with RF outperformed all other univariate and multivariate FS techniques with an accuracy value of 91.63% across all datasets. However, Fisher score trained with RF was the best choice when considering the number of features. Moreover, the univariate or multivariate based ensembles, in general, outperformed their singles. In addition, when comparing the univariate and multivariate ensembles, the fusion‐based ensemble outperformed all other ensembles achieving an accuracy of 91.77% when using RF across datasets. Nevertheless, in terms of performance and number of features, the ensemble constructed using Reciprocal Rank performed better than all other FS techniques regardless of the classifier used. It achieved an accuracy of 91.61% across datasets when using RF.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    2
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
2
Top 10%
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!