https://allenai.org/asta/bench
AstaBench: Benchmarking AI Agents for Science
AstaBench offers rigorous benchmarks and leaderboards to evaluate AI agents on thousands of scientific tasks across multiple domains.
ai agentsastabenchscience
https://allenai.org/blog/astabench
AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite | Ai2
Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.
ai agentsastabenchrigorous