Sponsor of the Day:
Jerkmate
https://huggingface.co/datasets/harborframework/terminal-bench-2.0?eval_result=deepseek-ai/DeepSeek-V4-Pro
harborframework/terminal-bench-2.0 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
terminal bench 2hugging face0datasets
https://andonlabs.com/evals/vending-bench-2
Vending-Bench 2 | Andon Labs
We're releasing Vending-Bench 2, a benchmark for measuring AI model performance on running a business over long time horizons. Models are tasked with running a...
bench 2andon labsvending
https://www.tbench.ai/news/announcement-2-0
Introducing Terminal-Bench 2.0 and Harbor
A harder, better verified version of Terminal-Bench and a new package evaluating and optimizing agents.
terminal bench 2introducing0harbor
https://www.roguefitness.com/replacement-parts/flat-utility-bench-2-0
Flat Utility Bench 2.0 | Rogue Fitness
bench 2 0rogue fitnessflatutility
https://huggingface.co/datasets/harborframework/terminal-bench-2.0?eval_result=Qwen/Qwen3.6-35B-A3B
harborframework/terminal-bench-2.0 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
terminal bench 2hugging face0datasets