Robuta

https://openreview.net/forum?id=36hVB7DEB0
Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves...
modular arithmeticemergencenonneuralmodels