Robuta

https://arxiv.org/abs/2506.22777
Abstract page for arXiv paper 2506.22777: Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
reward hackingteachingmodelsverbalizechain