Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Archivio della ricer...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Journal of Statistical Software
Article
License: cc-by
Data sources: UnpayWall
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
versions View all 6 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets

Authors: Lagani, Vincenzo; Athineou, Giorgos; Farcomeni, Alessio; Tsagris, Michail; Tsamardinos, Ioannis;

Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets

Abstract

The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.

Country
Italy
Subjects by Vocabulary

Microsoft Academic Graph classification: Feature selection Machine learning computer.software_genre Lasso (statistics) Equivalence (measure theory) Mathematics business.industry Contrast (statistics) Bayesian network Pattern recognition Function (mathematics) Regression Feature (computer vision) Artificial intelligence business computer

Library of Congress Subject Headings: lcsh:Statistics lcsh:HA1-4737

Keywords

Statistics and Probability, feature selection; constraint-based algorithms; multiple predictive signatures, feature selection, constraint-based algorithms, multiple predictive signatures, Statistics, Probability and Uncertainty, Settore SECS-S/01 - Statistica, Software

45 references, page 1 of 5

Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010). “Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation.” Journal of Machine Learning Research, 11, 171-234.

Aliferis CF, Statnikov AR, Tsamardinos I, Brown LE (2003). “Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery.” In The 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS '03).

Barton K (2016). MuMIn: Multi-Model Inference. R package version 1.15.6, URL https: //CRAN.R-project.org/package=MuMIn.

Brown C (2013). hash: Full Feature Implementation of Hash/Associated Arrays/Dictionaries. R package version 2.2.6, URL https://CRAN.R-project.org/package=hash.

Buckland ST, Burnham KP, Augustin NH (1997). “Model Selection: An Integral Part of Inference.” Biometrics, pp. 603-618. doi:10.2307/2533961.

Calcagno V, de Mazancourt C (2010). “glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models.” Journal of Statistical Software, 34(12), 1-29. doi:10.18637/jss.v034.i12. [OpenAIRE]

Christensen RHB (2015). “ordinal: Regression Models for Ordinal Data.” R package version 2015.6-28, URL https://CRAN.R-project.org/package=ordinal.

Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical Society B, 34, 187-220. doi:10.2307/2985181.

Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407-499. doi:10.1214/009053604000000067.

Fawcett T (2006). “An Introduction to ROC Analysis.” Pattern Recognition Letters, 27, 861-874. doi:10.1016/j.patrec.2005.10.010.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    141
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 1%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 1%
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    141
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 1%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 1%
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
141
Top 1%
Top 10%
Top 1%
Funded by
EC| STATEGRA
Project
STATEGRA
User-driven Development of Statistical Methods for Experimental Planning, Data Gathering, and Integrative Analysis of Next Generation Sequencing, Proteomics and Metabolomics data
  • Funder: European Commission (EC)
  • Project Code: 306000
  • Funding stream: FP7 | SP1 | HEALTH
,
EC| CAUSALPATH
Project
CAUSALPATH
Next Generation Causal Analysis: Inspired by the Induction of Biological Pathways from Cytometry Data
  • Funder: European Commission (EC)
  • Project Code: 617393
  • Funding stream: FP7 | SP2 | ERC
iis
Related to Research communities
moresidebar

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.