https://arxiv.org/abs/2211.03540
[2211.03540] Measuring Progress on Scalable Oversight for Large Language Models
Abstract page for arXiv paper 2211.03540: Measuring Progress on Scalable Oversight for Large Language Models
measuring progressscalable oversightlarge languagemodels
https://arxiv.org/abs/2407.04622
[2407.04622] On scalable oversight with weak LLMs judging strong LLMs
Abstract page for arXiv paper 2407.04622: On scalable oversight with weak LLMs judging strong LLMs
scalable oversightweakllmsjudgingstrong
https://www.anthropic.com/research/automated-alignment-researchers
Automated Alignment Researchers: Using large language models to scale scalable oversight \ Anthropic
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
large language modelsscalable oversightautomatedalignmentresearchers
https://www.layerthelatestinalattice.com/topics/scalable-oversight
Scalable Oversight & Alignment Theory | Lattice
Theoretical foundations of alignment, scalable oversight mechanisms, debate protocols, and iterated amplification.
scalable oversightalignmenttheorylattice
https://www.lesswrong.com/posts/SfhFh9Hfm6JYvzbby/the-scalable-formal-oversight-research-program
The Scalable Formal Oversight Research Program — LessWrong
Introduction Every serious person who thinks about AI safety observes the fundamental asymmetry between the ease of AI content generation and difficu…
research programscalableformaloversightlesswrong