Robuta

https://openreview.net/forum?id=EvSD9nZ6WF&referrer=%5Bthe%20profile%20of%20Kris%20De%20Asis%5D(%2Fprofile%3Fid%3D~Kris_De_Asis1)
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards...
temporal differencereinforcement learningfixedhorizonmethods
https://jmlr.org/beta/papers/v23/21-0947.html
temporal difference learningpolicy evaluationcontinuous time