descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 30 May 2023Embargo end date: 01 Jan 2022 English Publisher:Proceedings of the National Academy of SciencesJournal:Proceedings of the National Academy of Sciences, volume 120 (issn: 0027-8424, eissn: 1091-6490,

Copyright policy )Funded by:NSF | Collaborative Research: P..., NSF | Random Neural Networks, NSF | CAREER: Random Neural Net...

Authors: Boris Hanin; Alexander Zlokapa;

doi: 10.1073/pnas.2301345120 , 10.48550/arxiv.2212.14457

pmid: 37252994

pmc: PMC10266010

arXiv: http://arxiv.org/abs/2212.14457

Bayesian interpolation with deep linear networks

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We give here a complete solution in the special case of linear networks with output dimension one trained using zero noise Bayesian inference with Gaussian weight priors and mean squared error as a negative log-likelihood. For any training dataset, network depth, and hidden layer widths, we find non-asymptotic expressions for the predictive posterior and Bayesian model evidence in terms of Meijer-G functions, a class of meromorphic special functions of a single complex variable. Through novel asymptotic expansions of these Meijer-G functions, a rich new picture of the joint role of depth, width, and dataset size emerges. We show that linear networks make provably optimal predictions at infinite depth: the posterior of infinitely deep linear networks with data-agnostic priors is the same as that of shallow networks with evidence-maximizing data-dependent priors. This yields a principled reason to prefer deeper networks when priors are forced to be data-agnostic. Moreover, we show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth, elucidating the salutary role of increased depth for model selection. Underpinning our results is a novel emergent notion of effective depth, given by the number of hidden layers times the number of data points divided by the network width; this determines the structure of the posterior in the large-data limit.

Related Organizations

College of New Jersey
United States
Massachusetts Institute of Technology
United States
The University of Texas System
United States
Google (United States)
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Physical Sciences, Probability (math.PR), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Probability, Machine Learning (cs.LG)

2 Research products, page 1 of 1

A Data-Agnostic Approach to Automatic Testing of Multi-dimensional Databases
2014IsAmongTopNSimilarDocuments
Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics
2021IsAmongTopNSimilarDocuments

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	16
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%