https://arxiv.org/abs/2412.16339
[2412.16339] Deliberative Alignment: Reasoning Enables Safer Language Models
Abstract page for arXiv paper 2412.16339: Deliberative Alignment: Reasoning Enables Safer Language Models
deliberative alignmentsafer languagereasoningenablesmodels