https://www.philschmid.de/agents-pass-at-k-pass-power-k
Pass@k vs Pass^k: Understanding Agent Reliability
Mar 24, 2025 - Production agents need to be reliable. Why pass^k is a better metrics than pass@k.
agent reliabilitypasskvsunderstanding
https://arxiv.org/abs/2602.16666
[2602.16666] Towards a Science of AI Agent Reliability
Abstract page for arXiv paper 2602.16666: Towards a Science of AI Agent Reliability
science of aiagent reliabilitytowards
https://www.agensi.io/skills/agent-reliability-audit
Agent Reliability Audit | Agensi
Turn raw agent traces and tool logs into professional production-readiness audits and remediation reports. Install this SKILL.md skill for Claude Code…
agent reliabilityauditagensi
https://ona.com/stories/rethinking-the-todo-tool
Tackling Agent Reliability: Rethinking the Todo Tool at Ona · Ona
How we consolidated seven agent tools into one, adopted level-triggered state, and built runtime guardrails to keep AI agents on track.
agent reliabilitytacklingrethinkingtodotool
https://www.montecarlodata.com/blog-why-you-cant-answer-how-reliable-is-this-agent/
How To Measure Agent Reliability: Key Metrics For Performance & Outcomes
May 1, 2026 - This seemingly basic question prevents wider roll-out and adoption. Here’s how teams are solving for it.
how to measureagent reliabilitykey metricsfor performanceoutcomes