Robuta

https://openreview.net/forum?id=VCJ8NfVrcO&referrer=%5Bthe%20profile%20of%20Sharan%20Vaswani%5D(%2Fprofile%3Fid%3D~Sharan_Vaswani1)
Natural policy gradient (NPG) is a common policy optimization algorithm and can be viewed as mirror ascent in the space of probabilities. Recently, Vaswani et...
fast convergencesoftmaxpolicymirrorascent