Sponsor of the Day:
Jerkmate
https://developers.openai.com/api/docs/guides/direct-preference-optimization
Direct preference optimization | OpenAI API
optimization openai apidirect preference
https://iclr.cc/virtual/2026/poster/10010533
ICLR Poster Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
iclr posterpreference optimizationunidpoparadigm
https://dblp.org/rec/conf/icml/ZhuCW0ZJ25.html
dblp: TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization.
May 4, 2026 - Bibliographic details on TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization.
token leveldirect preferencedblpharnessingreward
https://www.together.ai/blog/direct-preference-optimization
Direct Preference Optimization: A Technical Deep Dive
Together AI now supports DPO fine-tuning. Learn how Direct Preference Optimization aligns language models with human preferences — with code examples and...
technical deep divedirect preferenceoptimization
https://arxiv.org/abs/2305.18290
[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Abstract page for arXiv paper 2305.18290: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
direct preferencelanguage model230518290optimization
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0001294
Participatory-informed preference optimization (PiPrO): A reinforcement learning simulation study |...
Author summary Artificial intelligence tools are increasingly adopted in medicine and public health, but they are often trained to reflect only one viewpoint....
preference optimizationreinforcement learningsimulation studyparticipatoryinformed
https://arxiv.org/abs/2405.00675
[2405.00675] Self-Play Preference Optimization for Language Model Alignment
Abstract page for arXiv paper 2405.00675: Self-Play Preference Optimization for Language Model Alignment
self playpreference optimizationlanguage model2405alignment
https://proceedings.mlr.press/v267/zhu25c.html
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference OptimizationMingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, H...
token leveldirect preferenceharnessingrewardguidance
https://journals.plos.org/digitalhealth/article/comments?id=10.1371/journal.pdig.0001294
Participatory-informed preference optimization (PiPrO): A reinforcement learning simulation study |...
Author summary Artificial intelligence tools are increasingly adopted in medicine and public health, but they are often trained to reflect only one viewpoint....
preference optimizationreinforcement learningsimulation studyparticipatoryinformed
https://huggingface.co/papers/2412.03187
Paper page - Weighted-Reward Preference Optimization for Implicit Model Fusion
Join the discussion on this paper page
preference optimizationpaperweightedrewardimplicit
https://slit-ai.github.io/FuseChat-3.0/
FuseChat-3.0: Preference Optimization for Implicit Model Fusion
FuseChat-3.0: Preference Optimization for Implicit Model Fusion
fusechat 3 0preference optimizationimplicitmodelfusion