
We present, via the solution of nonlinear parabolic partial differential equations (PDEs), a continuous-time formulation for stochastic optimization algorithms used for training deep neural networks. Using continuous-time formulation of stochastic differential equations (SDEs), relaxation approaches like the stochastic gradient descent (SGD) method are interpreted as the solution of nonlinear PDEs that arise from modeling physical problems. We reinterpret, through homogenization of SDEs, the modified SGD algorithm as the solution of the viscous Burgers' equation that models a highway traffic flow.
Final report submitted in partial fulfillment of the African Masters in Machine Intelligence (AMMI) Master's Degree program at the African Institute for Mathematical Sciences in Rwanda. This report reflects studies and research conducted by the first author during the program from 2019 to 2020. The official submission deadline for this report was 31 March 2021. For more information about the program, please visit AIMS-AMMI. Thanks to the program sponsors, Google and Meta Platforms (previously "Facebook").
Differential equations, Deep learning, Neural Networks, Computer, Applied mathematics, Partial differential equations, Mathematical analysis, Numerical analysis
Differential equations, Deep learning, Neural Networks, Computer, Applied mathematics, Partial differential equations, Mathematical analysis, Numerical analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
