Robuta

https://allenai.org/asta/bench AstaBench: Benchmarking AI Agents for Science AstaBench offers rigorous benchmarks and leaderboards to evaluate AI agents on thousands of scientific tasks across multiple domains. ai agentsastabenchscience https://allenai.org/blog/astabench AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite | Ai2 Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite. ai agentsastabenchrigorous