
AbstractAs high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defineda priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we presentK2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a “taxonomy-like” structure (https://github.com/montilab/K2Taxonomer).K2Taxonomerwas devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics data. For each of these data types, we demonstrate the power ofK2Taxonomerto discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.
Gene Expression Profiling, Computational Biology, Reproducibility of Results, Breast Neoplasms, Genomics, Prognosis, Survival Analysis, Gene Expression Regulation, Neoplastic, Lymphocytes, Tumor-Infiltrating, T-Lymphocyte Subsets, Methods Online, Cluster Analysis, Humans, Female, Single-Cell Analysis, Algorithms
Gene Expression Profiling, Computational Biology, Reproducibility of Results, Breast Neoplasms, Genomics, Prognosis, Survival Analysis, Gene Expression Regulation, Neoplastic, Lymphocytes, Tumor-Infiltrating, T-Lymphocyte Subsets, Methods Online, Cluster Analysis, Humans, Female, Single-Cell Analysis, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
