Robuta

https://programbench.com/task/alexpovel__srgn.89f943b/ alexpovel/srgn — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. srgnprogrambench