https://www.deeplearning.ai/the-batch/improve-agentic-performance-with-evals-and-error-analysis-part-2/
Oct 22, 2025 - In last week’s letter, I explained how effective agentic AI development needs a disciplined evals and error analysis process, and described an...
improveagenticperformanceevalserror
https://towardsdatascience.com/tds-newsletter-how-to-design-evals-metrics-and-kpis-that-work/
Dec 6, 2025 - On the challenges of producing reliable insights and avoiding common mistakes
tdsnewsletterdesignevalsmetrics
https://www.infoq.com/news/2026/02/hugging-face-evals/
Feb 19, 2026 - Hugging Face has launched Community Evals, a feature that enables benchmark datasets on the Hub to host their own leaderboards and automatically collect...
hugging faceintroducescommunityevalstransparent
https://www.playnsports.com/event/bethany-baseball-prospect-camp-with-pro-style-workout-and-player-evals-oct-9th/?path=baseball
Sep 1, 2022 - The Bethany College Prospect Camp is designed to provide players who want to continue their playing careers at the next level with an opportunity to receive...
bethanybaseballprospectcampstyle
https://boston.qcon.ai/presentation/boston2026/adaptive-recommenders-real-world-inference-evals-and-system-design
Modern personalization systems are shifting from hand-tuned heuristics to AI-native architectures, but building an adaptive recommendation engine in...
real worldqconaibostonadaptive
https://www.deeplearning.ai/the-batch/improve-agentic-performance-with-evals-and-error-analysis-part-1/
Oct 15, 2025 - Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress...
improveagenticperformanceevalserror
https://arize.com/blog/how-handshake-deployed-and-scaled-15-llm-use-cases-in-under-six-months-with-evals-from-day-one/
Aug 21, 2025 - Handshake is the largest early-career network, specializing in connecting students and new grads with employers and career centers. It’s also an engineering...
use caseshandshakedeployedscaledllm
https://docs.letta.com/guides/evals/overview/
Introduction to Letta's evaluation framework for testing and measuring agent performance.
lettaevalsdocs
https://harborframework.com/docs/evals
evals
https://blog.promptlayer.com/how-noredink-used-promptlayer-evals-to-deliver-1m-trustworthy-student-grades/
NoRedInk has been on a mission to unlock every writer's potential since 2012. Today, their adaptive writing platform serves 60% of U.S. school districts...
usedpromptlayerevalsdelivertrustworthy
https://labelbox.com/blog/how-to-improve-ai-app-generators-and-prompt-to-app-with-rubric-evaluations/
Overcome current AI app generator and prompt-to-app limitations with rubric evaluations and human-centric evaluations from Labelbox.
promptproductionimproveaiapp
https://simonwillison.net/2025/Apr/24/exploring-promptfoo/
I used part three (here’s parts one and two) of Dave Guarino’s series on evaluating how well LLMs can answer questions about SNAP (aka food stamps) as an...
exploringviadavesnapevals
https://roocode.com/evals
Explore quantitative evals of LLM coding skills across tasks and providers.
roo codeevals
https://humanloop.com/docs/v5/getting-started/overview
Learn how to use Humanloop for prompt engineering, evaluation and monitoring. Comprehensive guides and tutorials for LLMOps.
llmevalsplatformenterprisesdocs
https://freeplay.ai/
There's a better way to build AI products. Create the data flywheel to continuously improve your AI products & agents with evaluations, experiments,...
observability platformaievalsampbuild
https://jentic.com/blog/do-we-really-need-evals-for-agents
Do agents really need evals? Lessons from our live talk with ZenML’s CTO on practical evaluation, common mistakes, and production best practices.
reallyneedevalsagents
https://pydantic.dev/
Oct 17, 2025 - Ship predictable AI faster: Model-agnostic AI Agents with validated clear schema outputs & industry leading observability solutions built on open standards
observability aipydanticvalidationagentsevals