https://www.anthropic.com/research/constitutional-classifiers
Constitutional Classifiers: Defending against universal jailbreaks \ Anthropic
A paper from Anthropic describing a new way to guard LLMs against jailbreaking
constitutional classifiersdefendinguniversaljailbreaksanthropic