
doi: 10.1002/asmb.2335
AbstractIn a large variety of fields such as epidemiology, process monitoring, chemometrics, marketing, and social sciences among others, many research questions pertain to regression analysis from large data sets. Although in some cases standard regression will suffice, modeling is sometimes more challenging for various reasons (i) explain several variables; (ii) with a large number of explanatory variables organized into meaningful, usually ill‐conditioned, multidimensional matrices; (iii) where observations come from different subpopulations; and (iv) with the opportunity to predict new observations. Although some developed methods partially meet these challenges, none of them covers all these aspects. To fill this gap, a new method, called regularized clusterwise multiblock regression (CW.rMBREG), is proposed. The method CW.rMBREG combines clustering and a component‐based (multiblock) model associated with a well‐defined criterion to optimize. It provides simultaneously a partition of the observations into clusters along with the regression coefficients associated with each cluster. To go further, we propose to investigate a key feature generally neglected in clusterwise regression, ie, the prediction of new observations. The usefulness of CW.rMBREG is illustrated on the basis of both a simulation study and a real example in the field of indoor air quality. It results that CW.rMBREG improves the quality of the prediction and facilitates the interpretation of complex ill‐conditioned data. The proposed method is available for users through the R package mbclusterwise.
Ridge regression; shrinkage estimators (Lasso), Classification and discrimination; cluster analysis (statistical aspects), dimension reduction, multiblock classification, Inference from stochastic processes and prediction, clusterwise regression, multiblock component method, multicollinearity, Applications of statistics to environmental and related topics, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST]
Ridge regression; shrinkage estimators (Lasso), Classification and discrimination; cluster analysis (statistical aspects), dimension reduction, multiblock classification, Inference from stochastic processes and prediction, clusterwise regression, multiblock component method, multicollinearity, Applications of statistics to environmental and related topics, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
