Robuta

https://richardli.xyz/tag/policy-optimization/ Policy Optimization | Yingru Li Researcher in Reinforcement Learning and Decision Making policy optimization https://proceedings.neurips.cc/paper_files/paper/2023/hash/2c53bc01e30711a08f6ac86919193022-Abstract-Conference.html Policy Optimization for Continuous Reinforcement Learning policy optimizationcontinuousreinforcementlearning https://openreview.net/forum?id=BUMiizPcby6 Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for... Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this... policy optimizationoptimal transport https://openreview.net/forum?id=fB4V-2QvCEm Population-size-Aware Policy Optimization for Mean-Field Games | OpenReview In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the... population sizepolicy optimizationfield gamesawaremean https://tldr.takara.ai/p/2509.18849 MAPO: Mixed Advantage Policy Optimization | Takara TLDR Recent advances in reinforcement learning for foundation models, such as Group Relative Policy Optimization (GRPO), have significantly improved the performan... policy optimizationmapomixedadvantagetakara https://www.parasdahal.com/notes/grpo-group-relative-policy-optimization/ Group Relative Policy Optimization (GRPO) policy optimizationgrouprelativegrpo https://openreview.net/forum?id=xCRr9DrolJ Score Regularized Policy Optimization through Diffusion Behavior | OpenReview Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous... policy optimizationscorediffusionbehavioropenreview https://proceedings.neurips.cc/paper_files/paper/2022/hash/7d3298e48220b289318b533a848ea069-Abstract-Conference.html Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for... policy optimizationoptimal transport https://openreview.net/forum?id=cVyELMpMRS Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF | OpenReview Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still... policy optimization https://cemfustools.com/tag/policy-optimization/ Policy Optimization Archives | Cemfustools policy optimizationarchives https://kr.mathworks.com/help/reinforcement-learning/ref/rl.agent.rltrpoagent.html rlTRPOAgent - Trust region policy optimization (TRPO) reinforcement learning agent - MATLAB Trust region policy optimization (TRPO) is an on-policy, policy gradient reinforcement learning method for environments with a discrete or continuous action... policy optimizationreinforcement learningtrustregionagent https://arxiv.org/abs/2103.09756v1 [2103.09756v1] Near Optimal Policy Optimization via REPS Abstract page for arXiv paper 2103.09756v1: Near Optimal Policy Optimization via REPS policy optimizationnearoptimalviareps https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rlmbpoagent.html rlMBPOAgent - Model-based policy optimization (MBPO) reinforcement learning agent - MATLAB A model-based policy optimization (MBPO) agent is a model-based, off-policy, reinforcement learning method for environment with a discrete or continuous action... policy optimizationreinforcement learningmodelbasedagent https://proceedings.nips.cc/paper_files/paper/2025/hash/0939f13ffce3ff487509d902ddba4571-Abstract-Conference.html DPAIL: Training Diffusion Policy for Adversarial Imitation Learning without Policy Optimization imitation learningtrainingdiffusionpolicyadversarial