Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Similarity analysis of feature ranking techniques on imbalanced DNA microarray datasets

Authors: Randall Wald; Taghi M. Khoshgoftaar; Amri Napolitano; David J. Dittman;

Similarity analysis of feature ranking techniques on imbalanced DNA microarray datasets

Abstract

DNA microarrays are a modern advancement in the analysis of genetic data. This technology allows a researcher to test samples for thousands of genes simultaneously. However, once the samples in the DNA microarrays have been tested, the researcher must then search through the data collected and identify genes important to their problem. A possible solution to this issue is the data mining pre-processing technique called feature selection. Feature (gene) selection takes the original set of features (in the case of DNA microarrays, gene probes) and chooses an optimal subset to perform analysis from. Ideally, the reduced subset only contains the most important features as determined by the feature selection technique (or set of feature selection techniques), which allows for further research in the discovered genes. However in the case of using multiple feature selection techniques, the set of techniques must be diverse in order to reduce redundancy among the chosen features. Another benefit of increasing diversity is that any features chosen across a diverse set of feature selection techniques will have more importance than those chosen by a single technique or a set of related ones. Therefore, it would be useful to know how similar the feature selection techniques are to each other. In this study we perform an analysis of eighteen feature selection techniques across nine imbalanced DNA microarray datasets and using four feature subset sizes. Our results found that one should not use Gini Index and Probability Ratio together or the Kolmogorov-Smirnov statistic and Geometric Mean together at any feature subset size in order to minimize redundancy, and that the members of the first of these pairs (along with the pair of ReliefF and ReliefF-W) are very dissimilar to all rankers outside their own cluster. We also found that Chi-Squared, Information Gain, and Symmetric Uncertainty form a cluster of similarity, as do Chi-Squared, Deviance, F-Measure, and Mutual Information.

Related Organizations
  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    11
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
11
Top 10%
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!