https://openreview.net/forum?id=xhW2WyPhRP&referrer=%5Bthe%20profile%20of%20Tomer%20Galanti%5D(%2Fprofile%3Fid%3D~Tomer_Galanti1)
We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our...
weight decaysgdsecretlyminimizerank