https://rlhfbook.com/c/05-reward-models
RLHF Book
The Reinforcement Learning from Human Feedback Book
rlhfbook
https://rlhfbook.com/course
Course | RLHF Book by Nathan Lambert
Course lectures and talks on RLHF and post-training.
courserlhfbooknathanlambert
https://huggingface.co/datasets/Anthropic/hh-rlhf
Anthropic/hh-rlhf · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceanthropichhrlhfdatasets
https://arxiv.org/abs/2310.06452
[2310.06452] Understanding the Effects of RLHF on LLM Generalisation and Diversity
Abstract page for arXiv paper 2310.06452: Understanding the Effects of RLHF on LLM Generalisation and Diversity
understandingeffectsrlhfllmdiversity
https://docs.ray.io/en/latest/cluster/kubernetes/examples/verl-post-training.html
Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay — Ray 2.55.1
reinforcement learninghumanfeedbackrlhfllms
https://rlhfbook.com/c/07-reasoning
RLHF Book
The Reinforcement Learning from Human Feedback Book
rlhfbook
https://rlhfbook.com/c/08-direct-alignment
RLHF Book
The Reinforcement Learning from Human Feedback Book
rlhfbook
https://rlhfbook.com/c/06-policy-gradients
RLHF Book
The Reinforcement Learning from Human Feedback Book
rlhfbook
https://arxiv.org/abs/2310.06452?context=cs
[2310.06452] Understanding the Effects of RLHF on LLM Generalisation and Diversity
Abstract page for arXiv paper 2310.06452: Understanding the Effects of RLHF on LLM Generalisation and Diversity
understandingeffectsrlhfllmdiversity
https://ivyzhang.me/rlhf
RLHF
rlhf
https://arxiv.org/abs/2308.01320
[2308.01320] DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All...
Abstract page for arXiv paper 2308.01320: DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
deepspeedchateasyfastaffordable