descriptionPublicationkeyboard_double_arrow_right Article , Preprint 02 May 2022Embargo end date: 01 Jan 2020 English Publisher:Society for Industrial & Applied Mathematics (SIAM)Journal:SIAM Journal on Mathematics of Data Science, volume 4, pages 531-552 (eissn: 2577-0187,

Authors: O'Reilly, Eliza; Tran, Ngoc Mai;

doi: 10.1137/20m1354490 , 10.48550/arxiv.2002.00797

arXiv: http://arxiv.org/abs/2002.00797

Stochastic Geometry to Generalize the Mondrian Process

- Summary
- Subjects
- Metrics

Abstract

The stable under iterated tessellation (STIT) process is a stochastic process that produces a recursive partition of space with cut directions drawn independently from a distribution over the sphere. The case of random axis-aligned cuts is known as the Mondrian process. Random forests and Laplace kernel approximations built from the Mondrian process have led to efficient online learning methods and Bayesian optimization. In this work, we utilize tools from stochastic geometry to resolve some fundamental questions concerning STIT processes in machine learning. First, we show that a STIT process with cut directions drawn from a discrete distribution can be efficiently simulated by lifting to a higher dimensional axis-aligned Mondrian process. Second, we characterize all possible kernels that stationary STIT processes and their mixtures can approximate. We also give a uniform convergence rate for the approximation error of the STIT kernels to the targeted kernels, generalizing the work of [3] for the Mondrian case. Third, we obtain consistency results for STIT forests in density estimation and regression. Finally, we give a formula for the density estimator arising from an infinite STIT random forest. This allows for precise comparisons between the Mondrian forest, the Mondrian kernel and the Laplace kernel in density estimation. Our paper calls for further developments at the novel intersection of stochastic geometry and machine learning.

Related Organizations

California Institute of Technology
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Probability (math.PR), FOS: Mathematics, 60D05, 62G07, Machine Learning (stat.ML), General Medicine, Mathematics - Probability, 510, Machine Learning (cs.LG)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Top 10%

Average

Green

Fields of Science

Fields of Science