ParallelGSReg/GlobalSearchRegression.jl: v1.0.5

GlobalSearchRegression is both the world-fastest all-subset-regression command (a widespread tool for automatic model/feature selection) and a first-step to develop a coeherent framework to merge Machine Learning and Econometric algorithms. Written in Julia, it is a High Performance Computing version of the Stata-gsreg command (get the original code here). In a multicore personal computer (we use a Threadripper 1950x build for benchmarks), it runs up-to 100 times faster than the original Stata-code and up-to 10 times faster than well-known R-alternatives (pdredge). Notwithstanding, GlobalSearchRegression main focus is not only on execution-times but also on progressively combining Machine Learning algorithms with Econometric diagnosis tools into a friendly Graphical User Interface (GUI) to simplify embarrassingly parallel quantitative-research. In a Machine Learning environment (e.g. problems focusing on predictive analysis / forecasting accuracy) there is an increasing universe of "training/test" algorithms (many of them showing very interesting performance in Julia) to compare alternative results and find-out a suitable model. However, problems focusing on causal inference require five important econometric features: 1) Parsimony (to avoid very large atheoretical models); 2) Interpretability (for causal inference, rejecting "intuition-loss" transformation and/or complex combinations); 3) Across-models sensitivity analysis (uncertainty is the only certainty; parameter distributions are preferred against "best-model" unique results); 4) Robustness to time series and panel data information (preventing the use of raw bootstrapping or random subsample selection for training and test sets); and 5) advanced residual properties (e.g. going beyond the i.i.d assumption and looking for additional panel structure properties -for each model being evaluated-, which force a departure from many traditional machine learning algorithms). For all these reasons, researchers increasingly prefer advanced all-subset-regression approaches, choosing among alternative models by means of in-sample and/or out-of-sample criteria, model averaging results, bayesian priors for theoretical bounds on covariates coefficients and different residual constraints. While still unfeasible for large problems (choosing among hundreds of covariates), hardware and software innovations allow researchers to implement this approach in many different scientific projects, choosing among one billion models in a few hours using standard personal computers.

Julia's HPC command for automatic feature/model selection using all-subset-regression approaches

Related Organizations

National Scientific and Technical Research Council
Argentina
National University of La Plata
Argentina

Keywords

Parallel computing, FOS: Economics and business, Machine Learning, Julia, Econometrics, All-subset regression, Fat-Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility

views

1

1
views
Powered by

Found an issue? Give us feedback

visibility

0

Average

1