https://evaluations.metr.org/gpt-5-report/
Aug 6, 2025 - We evaluate whether GPT-5 poses significant catastrophic risks via AI self-improvement, rogue replication, or sabotage of AI labs. We conclude that this seems...
detailsevaluationopenaigptautonomy
https://evaluations.metr.org/gpt-4o-report/
Aug 7, 2024 - We measured the performance of GPT-4o given a simple agent scaffolding on 77 tasks across 30 task families testing autonomous capabilities.
detailspreliminaryevaluationgptautonomy