https://openreview.net/forum?id=qvR6OttCkv&referrer=%5Bthe%20profile%20of%20Peng%20Liu%5D(%2Fprofile%3Fid%3D~Peng_Liu5)
Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on...
report generationpreference optimizationradiologyviamulti
https://openreview.net/forum?id=V4oTkK7cQz&referrer=%5Bthe%20profile%20of%20Lin%20Li%5D(%2Fprofile%3Fid%3D~Lin_Li17)
When fine-tuning pre-trained Large Language Models (LLMs) to align with human values and intentions, maximizing the estimated reward can lead to superior...
risk awarepreference optimizationdirectnestedmeasure
https://openreview.net/forum?id=OEMhW2YnKz&referrer=%5Bthe%20profile%20of%20Yang%20Tang%5D(%2Fprofile%3Fid%3D~Yang_Tang3)
Gasoline blending scheduling is challenging, involving multiple conflicting objectives and a large decision space with many mixed integers. Due to these...
multiobjective optimizationgasoline blendingpreferencepredictionbased
https://openreview.net/forum?id=O2jukIZR50&referrer=%5Bthe%20profile%20of%20Harold%20Haodong%20Chen%5D(%2Fprofile%3Fid%3D~Harold_Haodong_Chen1)
Large Video Models (LVMs) built upon Large Language Models (LLMs) have shown promise in video understanding but often suffer from misalignment with human...
vistadpohierarchicalspatial
https://openreview.net/forum?id=w0lhe9prqH&referrer=%5Bthe%20profile%20of%20Shamanthak%20Hegde%5D(%2Fprofile%3Fid%3D~Shamanthak_Hegde2)
Recent advancements in human preference optimization, originally developed for Large Language Models (LLMs), have shown significant potential in improving...
preference optimizationdiffusion modelsdualcaptionopenreview
https://openreview.net/forum?id=vo9mmVYmGc&referrer=%5Bthe%20profile%20of%20Xinyu%20Qiu%5D(%2Fprofile%3Fid%3D~Xinyu_Qiu1)
Recent advances in multimodal large language models (LLMs) have emerged through serial inference time scaling, which involves generating longer reasoning...
self verificationmultimodal llmscalibratedviaadvantage
https://openreview.net/forum?id=jkUp3lybXf&referrer=%5Bthe%20profile%20of%20Fangkai%20Jiao%5D(%2Fprofile%3Fid%3D~Fangkai_Jiao1)
Preference optimization techniques, such as Direct Preference Optimization (DPO), are frequently employed to enhance the reasoning capabilities of large...
preference optimizationreasoningpseudofeedbackopenreview
https://www.autodesk.com/products/fusion-360/blog/quick-tip/
Have you ever wondered what settings we use to make the content you see here and other places? Watch this to see how we optimize Fusion 360's appearance.
quick tippreference optimizationdisplayfusionblog
https://openreview.net/forum?id=Q8Ee3yrwC6
Reinforcement Learning from Human Feedback (RLHF) has been highly successful in aligning large language models with human preferences. While prevalent methods...
preference optimizationtwo playermultistepvia
https://openreview.net/forum?id=zKoIRoDZM5&referrer=%5Bthe%20profile%20of%20Ruizhe%20Chen%5D(%2Fprofile%3Fid%3D~Ruizhe_Chen3)
Antibody design, a crucial task with significant implications across various disciplines such as therapeutics and biology, presents considerable challenges due...
specific antibodydirect energypreference optimizationantigendesign
https://arxiv.org/abs/2305.18290?utm_source=brainscriblr.beehiiv.com&utm_medium=referral&utm_campaign=what-is-dpo-ai-with-a-short-code-example
Abstract page for arXiv paper 2305.18290: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
preference optimizationlanguage modeldirectsecretly