Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Exploration of dimension reduction techniques for clustering spatial transcriptomics data

Authors: Jia, Ruize; Hung, Ling-Hong; Yeung, Ka Yee;

Exploration of dimension reduction techniques for clustering spatial transcriptomics data

Abstract

Spatial transcriptomics (ST) provides a spatially resolved, high-dimensional assessment of gene transcription. Spatial domain identification (SDI) is a critical task in ST, as it enables a deeper understanding of tissue microenvironments and biological functions. SDI typically involves a clustering step to infer spatial domains. Existing methods utilize statistical or deep learning models to incorporate spatial information for clustering. For statistical methods, Giotto implements a Hidden Markov Random Field model to detect spatial domains with consistent gene expression patterns, while BayesSpace uses a Bayesian model to encourage neighboring spots to be grouped together. [1] Among deep learning methods, GraphST uses graph convolutional networks and self-supervised contrastive learning to reconstruct gene expression matrix with spatial information. [2] SEDR adopts a variational graph autoencoder to produce embeddings that represent gene expression profiles with spatial information. However, existing methods face two major limitations in the clustering process. First, they often rely on a hardcoded number of clusters and/or model type. In practice, ground truth annotations, such as the number of spatial domains, are generally not available. Second, principal component analysis (PCA) is commonly used for dimension reduction of the gene expression matrix. However, PCA primarily captures variability that may not align with features needed for clustering, potentially hindering accurate domain identification. To tackle these limitations, we applied model-based clustering with various dimension reduction techniques. We compared and benchmarked different clustering and dimension reduction methods using the dorsolateral prefrontal cortex reference dataset consisting of 12 samples. Specifically, we experimented with mclust and substituted PCA with alternative dimensionality reduction techniques. Most importantly, we used the Bayesian Information Criterion (BIC) to select the best model and determine the optimal number of clusters. Clustering was performed on both the spatial embeddings and the spatially enhanced gene expression matrix, with results compared to external knowledge using the Adjusted Rand Index (ARI). Our preliminary results on dimensionality reduction methods suggest that Spatially Variable Genes (SVG) may offer a more effective approach compared to PCA. We explored various SVG selection methods, including Giotto KMeans, Giotto Rank, and Spark-X to reduce the dimensions of GraphST's reconstructed gene expression matrix. Using Giotto KMeans, the best BIC-selected model achieved a higher ARI of 0.6198, outperforming GraphST's default hardcoded model, which uses PCA embeddings for clustering and achieved an ARI of 0.5993 on sample 151673.

Related Organizations
Keywords

Spatial Domain Identification, Spatial Transcriptomics, Dimension Reduction, Mclust, Model Based Clustering

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Upload OA version
Are you the author? Do you have the OA version of this publication?