Robuta

https://openreview.net/forum?id=yqQVRNdmKJ&referrer=%5Bthe%20profile%20of%20Jiajun%20Zhang%5D(%2Fprofile%3Fid%3D~Jiajun_Zhang1)
Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large...
ktaemodelfreealgorithmkey