
arXiv: 2007.13869
Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
Mathematics - Differential Geometry, FOS: Computer and information sciences, Computer Science - Machine Learning, Learning and adaptive systems in artificial intelligence, probabilistic learning, Machine Learning (stat.ML), Dynamical Systems (math.DS), Computational methods for attractors of dynamical systems, Statistics - Computation, Machine Learning (cs.LG), resampling, Statistics - Machine Learning, data manifold, FOS: Mathematics, Nonparametric statistical resampling methods, Mathematics - Dynamical Systems, Computation (stat.CO), Higher-dimensional and -codimensional surfaces in Euclidean and related \(n\)-spaces, 37M22, 53-08, 53A07, 62F40, 62G09, Differential Geometry (math.DG), Statistics on manifolds, data augmentation
Mathematics - Differential Geometry, FOS: Computer and information sciences, Computer Science - Machine Learning, Learning and adaptive systems in artificial intelligence, probabilistic learning, Machine Learning (stat.ML), Dynamical Systems (math.DS), Computational methods for attractors of dynamical systems, Statistics - Computation, Machine Learning (cs.LG), resampling, Statistics - Machine Learning, data manifold, FOS: Mathematics, Nonparametric statistical resampling methods, Mathematics - Dynamical Systems, Computation (stat.CO), Higher-dimensional and -codimensional surfaces in Euclidean and related \(n\)-spaces, 37M22, 53-08, 53A07, 62F40, 62G09, Differential Geometry (math.DG), Statistics on manifolds, data augmentation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
