Robuta

https://programbench.com/task/nachoparker__dutree.44e877d/ nachoparker/dutree — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. dutreeprogrambench