
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.
quantitative finance, source code, statistical risk model, Quantitative Biology - Quantitative Methods, Article, FOS: Economics and business, somatic mutation, Quantitative Biology - Genomics, K-means, genome, Quantitative Methods (q-bio.QM), Genomics (q-bio.GN), Statistical Finance (q-fin.ST), industry classification, nonnegative matrix factorization, Quantitative Finance - Statistical Finance, DNA, eRank, sample, matrix, machine learning, correlation, covariance, FOS: Biological sciences, cancer signatures, exome, clustering
quantitative finance, source code, statistical risk model, Quantitative Biology - Quantitative Methods, Article, FOS: Economics and business, somatic mutation, Quantitative Biology - Genomics, K-means, genome, Quantitative Methods (q-bio.QM), Genomics (q-bio.GN), Statistical Finance (q-fin.ST), industry classification, nonnegative matrix factorization, Quantitative Finance - Statistical Finance, DNA, eRank, sample, matrix, machine learning, correlation, covariance, FOS: Biological sciences, cancer signatures, exome, clustering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
