Robuta

https://deepai.org/publication/distributed-online-optimization-in-dynamic-environments-using-mirror-descent
09/09/16 - This work addresses decentralized online optimization in non-stationary environments. A network of agents aim to track the minimiz...
online optimizationmirror descentdistributeddynamicenvironments
https://openreview.net/forum?id=kZstGANG8D
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences....
online mirror descentimprovingllmgeneralpreference