
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.
Genomics (q-bio.GN), Statistical Finance (q-fin.ST), clustering; K-means; nonnegative matrix factorization; somatic mutation; cancer signatures; genome; exome; DNA; eRank; correlation; covariance; machine learning; sample; matrix; source code; quantitative finance; statistical risk model; industry classification, Quantitative Finance - Statistical Finance, Quantitative Biology - Quantitative Methods, Article, FOS: Economics and business, FOS: Biological sciences, Quantitative Biology - Genomics, Quantitative Methods (q-bio.QM)
Genomics (q-bio.GN), Statistical Finance (q-fin.ST), clustering; K-means; nonnegative matrix factorization; somatic mutation; cancer signatures; genome; exome; DNA; eRank; correlation; covariance; machine learning; sample; matrix; source code; quantitative finance; statistical risk model; industry classification, Quantitative Finance - Statistical Finance, Quantitative Biology - Quantitative Methods, Article, FOS: Economics and business, FOS: Biological sciences, Quantitative Biology - Genomics, Quantitative Methods (q-bio.QM)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
