Robuta

https://programbench.com/task/brocode__fblog.3b54330/ brocode/fblog — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. fblogprogrambench