Robuta

https://research.google/pubs/regret-bounds-for-adversarial-contextual-bandits-with-general-function-approximation-and-delayed-feedback/
contextual banditsfunction approximationregretboundsadversarial
https://arxiv.org/abs/2210.10631
Abstract page for arXiv paper 2210.10631: Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets
contextual banditssimulatedpersonalizationtasksrecommendation
https://arxiv.org/abs/2502.07193v1
Abstract page for arXiv paper 2502.07193v1: Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits
provablyefficientrlhfpipelineunified
https://arxiv.org/abs/1712.00702
Abstract page for arXiv paper 1712.00702: Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits
millimeter waveefficientbeamalignmentsystems
https://openreview.net/forum?id=LXftdR11io&referrer=%5Bthe%20profile%20of%20Yuta%20Saito%5D(%2Fprofile%3Fid%3D~Yuta_Saito1)
We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods -- most of which rely crucially on...
contextual banditspotecpolicylargeaction
https://arxiv.org/abs/2111.12306
Abstract page for arXiv paper 2111.12306: Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
efficientoptimalalgorithmscontextualdueling