
doi: 10.1002/cjs.11579
The wide availability of computers enables us to accumulate a huge amount of data, thus effective tools to extract information from the huge volume of data have become critical. Principal component analysis (PCA) is a useful and traditional tool for dimensionality reduction of massive high‐dimensional datasets. Recently, sparse principal component (PC) loading estimation based on L1‐type regularization has drawn a large amount of attention. Although sparse PCA makes interpretation easily and performs dimension reduction without disturbance from noisy features, the existing studies on sparse PCA were based on an arbitrary number of PCs without any statistical justification. We propose a novel method, called as automatic sparse PCA, which can perform PC selection and sparse PC loading estimation, simultaneously. For PC selection, we first develop sparse singular value decomposition (sparse SVD), then incorporate sparsity into PC loading estimation. The proposed method enables us to perform dimension reduction and PC loading estimation, simultaneously. Furthermore, we can perform PCA without disturbance from noisy features. It can be seen through Monte Carlo experiments that the proposed automatic sparse PCA outperforms sparse structure identification and reconstructing data based on low‐dimensional projection. The proposed method is also applied to a number of real datasets and it can be also seen that our method achieves effectiveness for estimation accuracy and interpreting PCA results.
Computational methods for sparse matrices, \(L_1\)-type regularization, principal component analysis, sparsity, singular value decomposition, Factor analysis and principal components; correspondence analysis, dimensionality reduction
Computational methods for sparse matrices, \(L_1\)-type regularization, principal component analysis, sparsity, singular value decomposition, Factor analysis and principal components; correspondence analysis, dimensionality reduction
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
