Robuta

https://programbench.com/task/elkowar__pipr.fae0b17/ elkowar/pipr — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. piprprogrambench