
Abstract We propose a selection procedure of principal components in principal component regression. Our methodselects principal components using variable selection procedures instead of a small subset of major principalcomponents in principal component regression. Our procedure consists of two steps to improve estimation andprediction. First, we reduce the number of principal components using the conventional principal componentregression to yield the set of candidate principal components and then select principal components among thecandidate set using sparse regression techniques. The performance of our proposals is demonstrated numericallyand compared with the typical dimension reduction approaches (including principal component regression andpartial least square regression) using synthetic and real datasets.Keywords: Biased estimation, dimension reduction, penalized regression, principal componentregression, principal component selection. 1. Introduction Regression model is a popular statistical model for data analysis. Under fairly general conditions,the ordinary least squares (OLS) estimator of the regression model has many desirable properties,including unbiasedness and minimum variance. Multicollinearity deteriorates OLS estimator quality.Multicollinearity often arises in many real-world applications where the set of explanatory variableshas the nearly linear dependence or the sample size is smaller than the variable size. Two types ofbias estimation are widely used to address such situation. One of them is regularization method,where model parameters are selected under a certain constraint. Popular regularization methods forregressionareridgeregression,Lasso,andotherpenalizedregressionmethods. Theotherisdimensionreductionmethods(suchasprincipalcomponentregression(PCR)andpartialleastsquaresregression(PLSR)) where explanatory variable space is decomposed into orthogonal directions and only someof them are used for model building. Both of approaches produce biased estimators, but often have asmaller variance than OLS estimators. This property (incurring bias and reducing variance) improvesestimation and prediction, and this is the reason why biased estimation methods are popular. Moreaccounts can be found in standard textbooks (Bishop, 2006; Hastie
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 13 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
