Robuta

https://deepai.org/publication/distributed-online-optimization-in-dynamic-environments-using-mirror-descent
09/09/16 - This work addresses decentralized online optimization in non-stationary environments. A network of agents aim to track the minimiz...
online optimizationmirror descentdistributeddynamicenvironments
https://openreview.net/forum?id=kZstGANG8D
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences....
online mirror descentimprovingllmgeneralpreference
https://openreview.net/forum?id=twYT79Lrui&referrer=%5Bthe%20profile%20of%20Davoud%20Ataee%20Tarzanagh%5D(%2Fprofile%3Fid%3D~Davoud_Ataee_Tarzanagh1)
Attention mechanisms have revolutionized numerous domains of artificial intelligence, including natural language processing and computer vision, by enabling...
mirror descentoptimizingattentiongeneralizedmax