Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Dépôt Institutionel ...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
DBLP
Doctoral thesis . 2022
Data sources: DBLP
versions View all 2 versions
addClaim

Statistical and machine learning methods for identifying clusters of variables: with applications in omics, ecology and psychology.

Authors: Marion, Rebecca;

Statistical and machine learning methods for identifying clusters of variables: with applications in omics, ecology and psychology.

Abstract

In many fields, researchers are confronted by datasets whose variables demonstrate grouping patterns. For example, in transcriptomics data, where the variables are gene expression levels, certain groups of genes are involved in the same biological processes, so their expression levels are highly correlated. For complex diseases, such as cancer or heart disease, entire groups of genes are expected to contribute to the development or progression of disease. Thus, identifying these variable groups, or "clusters," can be instrumental in uncovering the mechanisms of disease and developing targeted treatments. However, in practice, these variable clusters are not known in advance and must be learned from the data. Clustering is a data analysis technique used to assign a set of objects to groups, or clusters, where similar objects are assigned to the same cluster and dissimilar objects to different clusters. While most work in the literature has focused on the problem of clustering observations (e.g. patients) given a set of variables (e.g. genes), this thesis proposes several statistical and machine learning methods for the problem of variable clustering. The objective of the thesis is to propose methods that can improve data analysis in contexts where the ultimate objective is to predict one or more targets (e.g. disease class) and identify clusters of predictor variables (e.g. genes, metabolites) that are most predictive of the target(s). We explore three problems related to this theme, drawing on applications from the fields of metabolomics, genomics, ecology and psychology. First, we propose AdaCLV, a variable clustering method for pre-processing high-dimensional metabolomics data such that important clusters of variables can be identified with greater precision. Second, we investigate the added value of integrating the target variable (e.g. disease class) into the variable clustering process. We introduce Weighted SOS-NMF, a method that improves variable clustering and variable selection performance by supervising the clustering of variables with the target before a predictive model is fitted. Finally, we examine the case of supervised variable clustering for data with multiple, orthogonal targets. Inspired by a common research problem in ecology and psychology, we propose BIOT, a method for transforming the dimensions of the target matrix so that they can be accurately predicted by small clusters of predictor variables. (SC - Sciences) -- UCL, 2021

Country
Belgium
Related Organizations
Keywords

Nonnegative matrix factorization, Variable selection, Matrix factorization, Prediction, Sparsity, Regression, Clustering, Variable clustering

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities
Cancer Research