Robuta

https://programbench.com/task/dalance__amber.69a0f52/ dalance/amber — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. dalanceamberprogrambench