publication . Preprint . 2017

Regression Phalanxes

Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.;
Open Access English
  • Published: 03 Jul 2017
Abstract
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensional data sets using hierarchical clustering and builds a prediction model for each phalanx for further ensembling. Through extensive simulation studies and several real-lif...
Subjects
free text keywords: Statistics - Machine Learning
Download from

Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185{193.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5{32.

Burden, F. R. (1989). Molecular identi cation number for substructure searches. Journal of Chemical Information and Computer Sciences, 29(3):225{227.

Carhart, R. E., Smith, D. H., and Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: de nition and applications. Journal of Chemical Information and Computer Sciences, 25(2):64{73. [OpenAIRE]

Esbensen, K., Midtgaard, T., and Schonkopf, S. (1996). Multivariate Analysis in Practice: A Training Package. Camo As.

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1.

Hughes-Oliver, J. M., Brooks, A. D., Welch, W. J., Khaledi, M. G., Hawkins, D., Young, S. S., Patil, K., Howell, G. W., Ng, R. T., and Chu, M. T. (2010). Chemmodlab: a web-based cheminformatics modeling laboratory. In silico biology, 11(1-2):61{81.

Lemberge, P., De Raedt, I., and Janssens, K. H. (2000). Quantitative analysis of 16- 17th century archaeological glass vessels using pls regression of epxma and mu-xrf data. Journal of chemometrics, 14(5):751{764. [OpenAIRE]

Liu, K., Feng, J., and Young, S. S. (2005). Powermv: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. Journal of chemical information and modeling, 45(2):515{522.

Sargsyan, K., Safta, C., Najm, H. N., Debusschere, B. J., Ricciuto, D., and Thornton, P. (2014). Dimensionality reduction for complex models via bayesian compressive sensing. International Journal for Uncertainty Quanti cation, 4(1). [OpenAIRE]

Scheetz, T. E., Kim, K.-Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., et al. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences, 103(39):14429{14434. [OpenAIRE]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267{288.

Tomal, J. H., Welch, W. J., Zamar, R. H., et al. (2015). Ensembling classi cation models based on phalanxes of variables with applications in drug discovery. The Annals of Applied Statistics, 9(1):69{93. [OpenAIRE]

Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue