Decoupled Weight Decay Regularization

Preprint English OPEN
Loshchilov, Ilya; Hutter, Frank;
  • Subject: Mathematics - Optimization and Control | Computer Science - Machine Learning | Computer Science - Neural and Evolutionary Computing

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementati... View more
Share - Bookmark