Robuta

https://huggingface.co/papers/2406.11839
Join the discussion on this paper page
preference optimizationlarge languagepapermdpoconditional