Downloads provided by UsageCounts
arXiv: 1912.06407
handle: 10016/38473 , 2117/383386
AbstractFramed in the literature on Interpretable Machine Learning, we propose a new procedure to assign a measure of relevance to each explanatory variable in a complex predictive model. We assume that we have a training set to fit the model and a test set to check its out-of-sample performance. We propose to measure the individual relevance of each variable by comparing the predictions of the model in the test set with those obtained when the variable of interest is substituted (in the test set) by its ghost variable, defined as the prediction of this variable by using the rest of explanatory variables. In linear models it is shown that, on the one hand, the proposed measure gives similar results to leave-one-covariate-out (loco, with a lowest computational cost) and outperforms random permutations, and on the other hand, it is strongly related to the usualF-statistic measuring the significance of a variable. In nonlinear predictive models (as neural networks or random forests) the proposed measure shows the relevance of the variables in an efficient way, as shown by a simulation study comparing ghost variables with other alternative methods (includinglocoand random permutations, and also knockoff variables and estimated conditional distributions). Finally, we study the joint relevance of the variables by defining the relevance matrix as the covariance matrix of the vectors of effects on predictions when using every ghost variable. Our proposal is illustrated with simulated examples and the analysis of a large real data set.
Estadística matemàtica, FOS: Computer and information sciences, Estimated conditional distributions, Computer Science - Machine Learning, Matemáticas, estimated conditional distributions, out-of-sample prediction, Machine Learning (stat.ML), Estadística, Computational aspects of data analysis and big data, Economía, Machine Learning (cs.LG), random permutations, Methodology (stat.ME), Statistical aspects of big data and data science, Classificació AMS::68 Computer science::68T Artificial intelligence, Statistics - Machine Learning, Nonparametric regression and quantile regression, Out-of-sample prediction, Partial correlation matrix, Statistics - Methodology, partial correlation matrix, Interpretable machine learning, explainable artificial intelligence, leave-one-covariate-out, Random permutations, knockoffs, interpretable machine learning, Mathematical statistics, Knockoffs, Àrees temàtiques de la UPC::Matemàtiques i estadística::Anàlisi matemàtica, Leave-one-covariate-out, Classificació AMS::62 Statistics::62G Nonparametric inference, Explainable artificial intelligence
Estadística matemàtica, FOS: Computer and information sciences, Estimated conditional distributions, Computer Science - Machine Learning, Matemáticas, estimated conditional distributions, out-of-sample prediction, Machine Learning (stat.ML), Estadística, Computational aspects of data analysis and big data, Economía, Machine Learning (cs.LG), random permutations, Methodology (stat.ME), Statistical aspects of big data and data science, Classificació AMS::68 Computer science::68T Artificial intelligence, Statistics - Machine Learning, Nonparametric regression and quantile regression, Out-of-sample prediction, Partial correlation matrix, Statistics - Methodology, partial correlation matrix, Interpretable machine learning, explainable artificial intelligence, leave-one-covariate-out, Random permutations, knockoffs, interpretable machine learning, Mathematical statistics, Knockoffs, Àrees temàtiques de la UPC::Matemàtiques i estadística::Anàlisi matemàtica, Leave-one-covariate-out, Classificació AMS::62 Statistics::62G Nonparametric inference, Explainable artificial intelligence
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 80 | |
| downloads | 105 |

Views provided by UsageCounts
Downloads provided by UsageCounts