Variable selection with LASSO regression for complex survey data

Name: Variable selection with LASSO regression for complex survey data
Keywords: Statistics, replicate weights, complex survey data, cross-validation, LASSO regression, variable selection

Amaia Iparragirre; Thomas Lumley; Irantzu Barrio; Inmaculada Arostegui

Found an issue? Give us feedback

downloadFull-Text

Recolector de Cienci...arrow_drop_down

Recolector de Ciencia Abierta, RECOLECTA

Article . 2023

License: CC BY

Full-Text: https://onlinelibrary.wiley.com/doi/full/10.1002/sta4.578

Data sources: Recolector de Ciencia Abierta, RECOLECTA

Stat

Article . 2023 . Peer-reviewed

License: CC BY

Data sources: Crossref

ARCHIVO DIGITAL PARA LA DOCENCIA Y LA INVESTIGACION

Article . 2023

Data sources: ARCHIVO DIGITAL PARA LA DOCENCIA Y LA INVESTIGACION

BCAM's Institutional Repository Data

Article . 2023

License: CC BY NC SA

Data sources: BCAM's Institutional Repository Data

zbMATH Open

Article . 2023

Data sources: zbMATH Open

Recolector de Ciencia Abierta, RECOLECTA

Article . 2023

License: CC BY NC SA

Full-Text: https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.578

Data sources: Recolector de Ciencia Abierta, RECOLECTA

Variable selection with LASSO regression for complex survey data

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2023 Spain English Publisher:WileyJournal:Stat, volume 12 (issn: 2049-1573, eissn: 2049-1573,

Copyright policy )

Authors: Amaia Iparragirre; Thomas Lumley; Irantzu Barrio; Inmaculada Arostegui;

doi: 10.1002/sta4.578

handle: 20.500.11824/1669 , 10810/61467

Variable selection with LASSO regression for complex survey data

- Summary
- Subjects
- Metrics

Abstract

Variable selection is an important step to end up with good prediction models. LASSO regression models are one of the most commonly used methods for this purpose, for which cross‐validation is the most widely applied validation technique to choose the tuning parameter . Validation techniques in a complex survey framework are closely related to “replicate weights”. However, to our knowledge, they have never been used in a LASSO regression context. Applying LASSO regression models to complex survey data could be challenging. The goal of this paper is twofold. On the one hand, we analyze the performance of replicate weights methods to select the tuning parameter for fitting LASSO regression models to complex survey data. On the other hand, we propose new replicate weights methods for the same purpose. In particular, we propose a new design‐based cross‐validation method as a combination of the traditional cross‐validation and replicate weights. The performance of all these methods has been analyzed and compared by means of an extensive simulation study to the traditional cross‐validation technique to select the tuning parameter for LASSO regression models. The results suggest a considerable improvement when the new proposal design‐based cross‐validation is used instead of the traditional cross‐validation.

Country

Spain

Related Organizations

UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO UNIBERTSITATEA
Spain
University of Auckland
New Zealand
Basque Center for Applied Mathematics
Spain
University of the Basque Country
Spain

Keywords

Statistics, replicate weights, complex survey data, cross-validation, LASSO regression, variable selection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	18
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

18

Top 10%

Green

gold