Robuta

https://allenai.org/asta/bench AstaBench: Benchmarking AI Agents for Science AstaBench offers rigorous benchmarks and leaderboards to evaluate AI agents on thousands of scientific tasks across multiple domains. benchmarking aiastabenchagentsscience