Robuta

https://programbench.com/task/tstack__lnav.ee34494/ tstack/lnav — ProgramBench ProgramBench evaluates whether language models can rebuild programs from scratch. tstacklnavprogrambench