
With the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm for high-dimensional data. In the double filter stage, the algorithm first evaluates all features from two perspectives using two filter algorithms: ReliefF and the Pearson correlation coefficient. It then selects the top k features and obtains a candidate feature subset F by taking the intersection. Next, the standard Cauchy distribution was used for population initialization. Subsequently, the algorithm enters the double wrapper stage, where it uses the Random Walk Whale Optimization Algorithm (RWWOA) and the improved Adaptive Differential Evolution (ADE) to jointly optimize and obtain the optimal feature subset. Among them, in order to overcome the problem of single algorithm falling into the local optimum, the Algorithm Iteration Mechanism is proposed, which selectively runs two wrapper algorithms to make the algorithm jump out of local optimum and explore a broader optimization space. Finally, we verified the effectiveness of the algorithm through three sets of comparative experiments. The experimental results show that the DFDW algorithm performed well in obtaining the optimal feature subsets on 10 high-dimensional datasets, with an average classification accuracy of more than 95.1% on 8 datasets, a dimensionality reduction rate of less than 0.64% on all datasets, and the lowest dimensionality reduction rate of 0.19%.
ReliefF, differential evolution, Feature selection, Pearson correlation coefficient, Electrical engineering. Electronics. Nuclear engineering, whale optimization algorithm, TK1-9971
ReliefF, differential evolution, Feature selection, Pearson correlation coefficient, Electrical engineering. Electronics. Nuclear engineering, whale optimization algorithm, TK1-9971
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
