
arXiv: 1903.03673
From a combinatorial point of view, we consider the Earth Mover's Distance (EMD) associated with a metric measure space. The specific case considered is deceptively simple: Let the finite set [n] = {1,...,n} be regarded as a metric space by restricting the usual Euclidean distance on the real numbers. The EMD is defined on ordered pairs of probability distributions on [n]. We provide an easy method to compute a generating function encoding the values of EMD in its coefficients, which is related to the Segre embedding from projective algebraic geometry. As an application we use the generating function to compute the expected value of EMD in this one-dimensional case. The EMD is then used in clustering analysis for a specific data set.
To appear in the Journal of Algebraic Statistics
Probability measures on topological spaces, 05E40, 62H30, Classification and discrimination; cluster analysis (statistical aspects), Applications of graph theory, earth mover's distance, Segre embedding, spectral graph theory, Algebraic statistics, generating function, FOS: Mathematics, Mathematics - Combinatorics, Combinatorics (math.CO), Combinatorial aspects of commutative algebra, clustering
Probability measures on topological spaces, 05E40, 62H30, Classification and discrimination; cluster analysis (statistical aspects), Applications of graph theory, earth mover's distance, Segre embedding, spectral graph theory, Algebraic statistics, generating function, FOS: Mathematics, Mathematics - Combinatorics, Combinatorics (math.CO), Combinatorial aspects of commutative algebra, clustering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 6 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
