Robuta

https://arxiv.org/abs/2211.03540 [2211.03540] Measuring Progress on Scalable Oversight for Large Language Models Abstract page for arXiv paper 2211.03540: Measuring Progress on Scalable Oversight for Large Language Models measuring progressscalable oversightlarge languagemodels https://arxiv.org/abs/2407.04622 [2407.04622] On scalable oversight with weak LLMs judging strong LLMs Abstract page for arXiv paper 2407.04622: On scalable oversight with weak LLMs judging strong LLMs scalable oversightweakllmsjudgingstrong https://www.anthropic.com/research/automated-alignment-researchers Automated Alignment Researchers: Using large language models to scale scalable oversight \ Anthropic Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems. large language modelsscalable oversightautomatedalignmentresearchers https://www.layerthelatestinalattice.com/topics/scalable-oversight Scalable Oversight & Alignment Theory | Lattice Theoretical foundations of alignment, scalable oversight mechanisms, debate protocols, and iterated amplification. scalable oversightalignmenttheorylattice https://www.lesswrong.com/posts/SfhFh9Hfm6JYvzbby/the-scalable-formal-oversight-research-program The Scalable Formal Oversight Research Program — LessWrong Introduction Every serious person who thinks about AI safety observes the fundamental asymmetry between the ease of AI content generation and difficu… research programscalableformaloversightlesswrong