https://richardli.xyz/tag/policy-optimization/
Policy Optimization | Yingru Li
Researcher in Reinforcement Learning and Decision Making
policy optimization
https://proceedings.neurips.cc/paper_files/paper/2023/hash/2c53bc01e30711a08f6ac86919193022-Abstract-Conference.html
Policy Optimization for Continuous Reinforcement Learning
policy optimizationcontinuousreinforcementlearning
https://openreview.net/forum?id=BUMiizPcby6
Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for...
Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this...
policy optimizationoptimal transport
https://openreview.net/forum?id=fB4V-2QvCEm
Population-size-Aware Policy Optimization for Mean-Field Games | OpenReview
In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the...
population sizepolicy optimizationfield gamesawaremean
https://tldr.takara.ai/p/2509.18849
MAPO: Mixed Advantage Policy Optimization | Takara TLDR
Recent advances in reinforcement learning for foundation models, such as Group Relative Policy Optimization (GRPO), have significantly improved the performan...
policy optimizationmapomixedadvantagetakara
https://www.parasdahal.com/notes/grpo-group-relative-policy-optimization/
Group Relative Policy Optimization (GRPO)
policy optimizationgrouprelativegrpo
https://openreview.net/forum?id=xCRr9DrolJ
Score Regularized Policy Optimization through Diffusion Behavior | OpenReview
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous...
policy optimizationscorediffusionbehavioropenreview
https://proceedings.neurips.cc/paper_files/paper/2022/hash/7d3298e48220b289318b533a848ea069-Abstract-Conference.html
Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for...
policy optimizationoptimal transport
https://openreview.net/forum?id=cVyELMpMRS
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF | OpenReview
Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still...
policy optimization
https://cemfustools.com/tag/policy-optimization/
Policy Optimization Archives | Cemfustools
policy optimizationarchives
https://kr.mathworks.com/help/reinforcement-learning/ref/rl.agent.rltrpoagent.html
rlTRPOAgent - Trust region policy optimization (TRPO) reinforcement learning agent - MATLAB
Trust region policy optimization (TRPO) is an on-policy, policy gradient reinforcement learning method for environments with a discrete or continuous action...
policy optimizationreinforcement learningtrustregionagent
https://arxiv.org/abs/2103.09756v1
[2103.09756v1] Near Optimal Policy Optimization via REPS
Abstract page for arXiv paper 2103.09756v1: Near Optimal Policy Optimization via REPS
policy optimizationnearoptimalviareps
https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rlmbpoagent.html
rlMBPOAgent - Model-based policy optimization (MBPO) reinforcement learning agent - MATLAB
A model-based policy optimization (MBPO) agent is a model-based, off-policy, reinforcement learning method for environment with a discrete or continuous action...
policy optimizationreinforcement learningmodelbasedagent
https://proceedings.nips.cc/paper_files/paper/2025/hash/0939f13ffce3ff487509d902ddba4571-Abstract-Conference.html
DPAIL: Training Diffusion Policy for Adversarial Imitation Learning without Policy Optimization
imitation learningtrainingdiffusionpolicyadversarial