Robuta

https://baxbench.com/ BaxBench: Can LLMs Generate Secure and Correct Backends? We introduce a novel benchmark to evaluate LLMs on secure and correct code generation, showing that even flagship LLMs are not ready for coding automation,... generate securebaxbenchllmscorrectbackends