contextual bandits - Robuta Search

https://research.google/pubs/regret-bounds-for-adversarial-contextual-bandits-with-general-function-approximation-and-delayed-feedback/

Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed...

contextual bandits function approximation regret bounds adversarial

https://arxiv.org/abs/2210.10631

[2210.10631] Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets

Abstract page for arXiv paper 2210.10631: Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets

contextual bandits simulated personalization tasks recommendation

https://arxiv.org/abs/2502.07193v1

[2502.07193v1] Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits

Abstract page for arXiv paper 2502.07193v1: Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits

provably efficient rlhf pipeline unified

https://arxiv.org/abs/1712.00702

[1712.00702] Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits

Abstract page for arXiv paper 1712.00702: Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits

millimeter wave efficient beam alignment systems

https://openreview.net/forum?id=LXftdR11io&referrer=%5Bthe%20profile%20of%20Yuta%20Saito%5D(%2Fprofile%3Fid%3D~Yuta_Saito1)

POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition | OpenReview

We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods -- most of which rely crucially on...

contextual bandits potec policy large action

https://arxiv.org/abs/2111.12306

[2111.12306] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

Abstract page for arXiv paper 2111.12306: Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

efficient optimal algorithms contextual dueling