Robuta

https://programbench.com/extended/ Extended Results — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. extended resultsprogrambench