
We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the least absolute shrinkage and selection operator (Lasso) may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman–Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and ordinary least squares (OLS) for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS.
Ridge regression; shrinkage estimators (Lasso), Neyman–Rubin model, Statistics, Statistics as Topic, average treatment effect, Mathematics - Statistics Theory, Statistics Theory (math.ST), Mathematical Sciences, randomized experiment, Optimal statistical designs, Data analysis (statistics), Treatment Outcome, high-dimensional statistics, FOS: Mathematics, Lasso, Neyman-Rubin model, Asymptotic properties of parametric estimators, Randomized Controlled Trials as Topic
Ridge regression; shrinkage estimators (Lasso), Neyman–Rubin model, Statistics, Statistics as Topic, average treatment effect, Mathematics - Statistics Theory, Statistics Theory (math.ST), Mathematical Sciences, randomized experiment, Optimal statistical designs, Data analysis (statistics), Treatment Outcome, high-dimensional statistics, FOS: Mathematics, Lasso, Neyman-Rubin model, Asymptotic properties of parametric estimators, Randomized Controlled Trials as Topic
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 129 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
