
arXiv: 2006.09858
Many data analysis problems can be cast as distance geometry problems in \emph{space forms} -- Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only \emph{comparisons} of similarities amongst a set of entities, what can be said about the geometry of the underlying space form? To study this question, we introduce the notions of the \textit{ordinal capacity} of a target space form and \emph{ordinal spread} of the similarity measurements. The latter is an indicator of complex patterns in the measurements, while the former quantifies the capacity of a space form to accommodate a set of measurements with a specific ordinal spread profile. We prove that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. This leads to a lower bound on the Euclidean and spherical embedding dimension of what we term similarity graphs. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form. We support our theoretical claims with experiments on weighted trees, single-cell RNA expression data and spherical cartographic measurements.
Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (stat.ML), Electrical Engineering and Systems Science - Signal Processing, Machine Learning (cs.LG)
Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (stat.ML), Electrical Engineering and Systems Science - Signal Processing, Machine Learning (cs.LG)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
