Name: A Mathematical Framework for Learning Probability Distributions
Creator: Yang, Hongkang
Keywords: FOS: Computer and information sciences, Artificial intelligence, Computer Science - Machine Learning, generative modeling, memorization, generalization error, Machine Learning (stat.ML), 02 engineering and technology, Nonparametric inference, Machine Learning (cs.LG)

A mathematical framework for learning probability distributions

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jun 2022Embargo end date: 01 Jan 2022Publisher:Global Science PressJournal:Journal of Machine Learning, volume 1, pages 373-431 (issn: 2790-203X, eissn: 2790-2048,

Authors: Yang, Hongkang;

doi: 10.4208/jml.221202 , 10.48550/arxiv.2212.11481

arXiv: 2212.11481

A Mathematical Framework for Learning Probability Distributions

- Summary
- Subjects
- Metrics

Abstract

The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.

fixed typos

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, Artificial intelligence, Computer Science - Machine Learning, generative modeling, memorization, generalization error, Machine Learning (stat.ML), Nonparametric inference, Machine Learning (cs.LG), Statistics - Machine Learning, density estimation, implicit regularization, Linear function spaces and their duals, 68T07, 62G05, 60-08

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

hybrid

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Related to Research communities

Knowmad Institut