Contact
DMCA
Privacy
Robuta
https://openreview.net/forum?id=36hVB7DEB0
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product |...
Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves...
modular arithmetic
emergence
non
neural
models