Robuta

https://stratix.layerlens.ai/evaluations/67e636019e1624c690666efb Anthropic: Claude 3.5 Sonnet on Knights and Knaves - Stratix Trace-level evaluation of agent behavior across reasoning, tool use, retries, and state transitions. Built for pre-deployment verification. knights and knavesanthropic claudesonnet