ktae - Robuta Search

https://openreview.net/forum?id=yqQVRNdmKJ&referrer=%5Bthe%20profile%20of%20Jiajun%20Zhang%5D(%2Fprofile%3Fid%3D~Jiajun_Zhang1)

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning |...

Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large...

ktae model free algorithm key