Unsupervised Spatial Segmentation of the Tumor Microenvironment via Multi-Scale Z-Axis Stratification and Molecular Density Fields: A Label-Free Grid-Level Framework with MRF Smoothing and Interface Gradient Analysis

Overview Using public 10x Xenium spatial transcriptomic data of breast cancer, we exploit the cell-stacking phenomenon in the Z-axis — traditionally treated as a technical artifact — as a genuine physical signal. We develop a grid-based, multi-scale texture analysis framework built on Z-axis stratification statistics. The pipeline proceeds as follows: Transcripts are binned into spatial grids after quality control. A robust baseline correction (RANSAC + Huber regression with automatic linear/quadratic model selection via AIC, validated by Moran's I on residuals) removes global geometric trends from the Z-axis. At multiple Gaussian kernel scales (σ = 15, 30, 45 μm), three continuous physical fields are computed per grid: transcription molecule density (ρ), overall Z-dispersion (z_std_all), and imbalance-enhanced upper–lower Z-dispersion difference (z_std_diff_enhanced). Edge correction and confidence weighting are applied throughout. Using only these geometry-derived features — with no pathological partitioning, cell-type labels, or gene expression input — we perform unsupervised classification via diagonal-covariance GMM with Potts-model MRF spatial smoothing. The number of clusters (K) is selected by bootstrap stability + ICL, and the smoothing strength (λ) is chosen by a stability–boundary-ratio objective over a sigma × lambda sensitivity grid. A leakage guard formally verifies that no biological or expression features enter the classification stage. Post-classification biological validation is conducted entirely downstream: grid-level count matrices and CPM are constructed, per-cluster marker ranking (vectorized Wilcoxon one-vs-rest) and pairwise differential gene expression (Mann–Whitney U with BH correction) are performed, followed by pathway enrichment (MSigDB Hallmark, GO BP, KEGG) via gseapy. Marker-group scoring (log1p mean CPM of curated gene panels) with Cohen's d effect sizes quantifies functional differences between clusters. Spatial interface analysis computes signed distances to the cluster boundary, constructs interface gradient heatmaps (z-scored feature profiles binned by distance), and derives interface strength/sharpness metrics (contrast Cohen's d, near-boundary slope, maximum gradient, AUC separation). A radius sensitivity sweep with partial Spearman correlations (controlling for transcript density) confirms that the Z-dispersion–density and Z-dispersion–heterogeneity associations are robust across neighborhood scales. A panel-restricted DGE and its own pathway enrichment provide a focused validation on biologically curated gene sets. 概览基于公开的10x Xenium乳腺癌空间转录组数据,我们将Z轴上的细胞堆叠现象——传统上被视为技术误差来源——反向利用为真实的物理信号,开发了一套基于Z轴分层统计的网格化多尺度纹理分析框架。流程如下: 质控后将转录本分配至空间网格。通过稳健基线校正(RANSAC + Huber回归,AIC自动选择线性/二次模型,Moran's I验证残差空间自相关)去除Z轴的全局几何趋势。在多个高斯核尺度(σ = 15、30、45 μm)下,为每个网格计算三个连续物理场:转录分子密度(ρ)、整体Z离散度(z_std_all)、以及经不平衡增强的上下Z离散度差异(z_std_diff_enhanced)。全程施加边缘校正与置信度加权。仅使用上述几何衍生特征——不涉及任何病理分区、细胞类型标签或基因表达信息——通过对角协方差GMM结合Potts模型MRF空间平滑进行无监督聚类。簇数K由bootstrap稳定性+ICL联合选择,平滑强度λ通过sigma × lambda敏感性网格上的稳定性-边界比目标函数确定。正式验证分类阶段未引入任何生物学或表达特征。分类后的生物学验证完全在下游进行:构建网格级计数矩阵与CPM,执行逐簇标记基因排序(向量化Wilcoxon一对其余)和成对差异基因表达(Mann–Whitney U + BH校正),随后通过gseapy进行通路富集(MSigDB Hallmark、GO BP、KEGG)。标记基因组评分(策划基因面板CPM均值的log1p)配合Cohen's d效应量,量化簇间功能差异。空间界面分析计算到簇边界的有符号距离,构建界面梯度热图(按距离分箱的z-score特征谱),并推导界面强度/锐度指标(对比度Cohen's d、近边界斜率、最大梯度、AUC分离度)。半径敏感性扫描结合偏Spearman相关(控制转录密度)确认Z离散度与密度、Z离散度与异质性的关联在不同邻域尺度下稳健成立。面板限定的差异表达及其独立通路富集,在生物学策划基因集上提供聚焦验证。

Keywords

Markov random field, spatial transcriptomics, interface analysis, cell stacking, pathway enrichment, KDE, Xenium, spatial autocorrelation, unsupervised clustering, label-free segmentation, Z-axis stratification, Gaussian mixture model, Tumor Microenvironment, Potts model, grid-based texture analysis, DGE, signed distance

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Cancer Research