https://openreview.net/forum?id=Bkg6RiCqY7
Decoupled Weight Decay Regularization | OpenReview
L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we...
weight decaydecoupledregularizationopenreview