Robuta

https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled Weight Decay Regularization | OpenReview L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we... weight decaydecoupledregularizationopenreview