Robuta

https://programbench.com/task/crowdagger__crowbook.ea214d7/ crowdagger/crowbook — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. crowbookprogrambench