Robuta

https://www.evals.anthropic.com/
Interactive data visualization to explore LM generated datasets for evaluating LM behaviors.
language modelwritten evaluationsdiscoveringbehaviors