<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 13 Apr 2023 Spain Publisher:Public Library of Science (PLoS)Journal:PLOS ONE, volume 18, page e0284150 (eissn: 1932-6203,

Authors: Miren Hayet-Otero; Fernando García-García; Dae-Jin Lee; Joaquín Martínez-Minaya; Pedro Pablo España Yandiola; Isabel Urrutia Landa; Mónica Nieves Ermecheo; +6 Authors

doi: 10.1371/journal.pone.0284150

pmid: 37053151

pmc: PMC10101453

handle: 20.500.11824/1543 , 20.500.14417/3143 , 10810/61151 , 10251/210810

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrollingn= 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing ⩾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these becamed= 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (⩾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient’s C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels –saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2–, the neutrophil-to-lymphocyte ratio (NLR) –to certain extent, also neutrophil and lymphocyte counts separately–, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been founda posterioribetween our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives.

Country

Spain

Related Organizations

Universitat Politècnica de València
Spain
Hospital Universitari i Politècnic La Fe
Spain
UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO UNIBERTSITATEA
Spain
Jeonbuk National University
Korea (Republic of)
University of the Basque Country
Spain

View all View all

Keywords

ESTADISTICA E INVESTIGACION OPERATIVA, Interleukin 6, Pneumonia severity prediction, Feature selection (FS), Particulate matter 2.5, Nitrogen dioxide, C reactive protein, Horowitz index, Q, R, Lactate dehydrogenase, Machine learning (ML), Prognosis, Troponin, Hospitalization, Neutrophil lymphocyte ratio, Medicine, Pneumonia Severity Index, Cohort analysis, Procalcitonin, Human, Research Article, Science, Oxygen saturation, Air pollution, COVID-19 pandemic, Breathing rate, SARS-CoV-2 pneumonia, Aspartate aminotransferase, Ozone, Particulate matter 10, Machine learning, Adults, Training, Humans, Creatine kinase, Disease severity, Pandemics, Retrospective Studies, Ferritin, Pandemic, SARS-CoV-2, Quality control, COVID-19, Bilirubin, ADULTS, Pneumonia, Oxygen, Clinical variables, Socioeconomics, MARKER, Alanine aminotransferase, Brain natriuretic peptide

1 Research products, page 1 of 1

FeatSel-COVID-19-PLOS-ONE software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%