https://github.com/confident-ai/deepeval
GitHub - confident-ai/deepeval: The LLM Evaluation Framework · GitHub
The LLM Evaluation Framework. Contribute to confident-ai/deepeval development by creating an account on GitHub.
llm evaluationgithubconfidentaiframework
https://www.evidentlyai.com/courses
Evidently AI - LLM Evaluation and AI Observability Courses
Learn about LLM evaluation and AI observability with our free hands-on courses.
llm evaluationaiobservabilitycourses
https://www.evidentlyai.com/llm-evaluation-course-practice
Evidently AI - LLM evaluation for builders: applied course
Free video course with 10 hands-on code tutorials. From designing custom LLM judges to RAG evaluations and adversarial testing. Sign up to save your seat.
llm evaluationfor buildersaiappliedcourse
https://langwatch.ai/
LangWatch: AI Agent Testing and LLM Evaluation Platform
LangWatch is an AI agent testing, LLM evaluation, and LLM observability platform. Test agents with simulated users, prevent regressions, and debug issues.
ai agent testingllm evaluationplatform
https://deepeval.com/
DeepEval by Confident AI - The LLM Evaluation Framework
DeepEval is the open-source LLM evaluation framework for testing and benchmarking LLM applications — 50+ plug-and-play metrics for AI agents, RAG, chatbots,...
llm evaluationconfidentaiframework
Sponsored https://www.tushyraw.com/
TUSHY RAW: Intense 4K Videos Featuring Raw Passion from Behind
TUSHY RAW delivers powerful scenes with stunning performers exploring their wild side. Shot in striking 4K, experience real chemistry and bold action from behind...
https://galileo.ai/blog/metrics-first-approach-to-llm-evaluation
A Metrics-First Approach to LLM Evaluation
Aug 12, 2025 - Learn about different types of LLM evaluation metrics
llm evaluationmetricsfirstapproach
https://deepchecks.com/
Deepchecks LLM Evaluation | Evaluate AI Progress with Know Your Agent | Deepchecks
Apr 20, 2026 - Deepchecks LLM Evaluation is an enterprise-grade AI testing, observability and monitoring platform that provides visibility, control, and trust across AI...
llm evaluationevaluateaiprogressknow
https://www.evidentlyai.com/llm-testing
LLM Evaluation and Testing Platform | Evidently AI
Catch hallucinations, safety risks, and quality issues in LLM products before they impact users. Automate, customize, and track AI testing at scale.
llm evaluationtestingplatformai
https://www.getplum.ai/privacy-policy/
Plum AI - LLM Quality Evaluation & Improvement
Plum AI is a tool that evaluates and improves the quality of large-language model applications
plumaillmqualityevaluation
Sponsored https://www.slayed.com/
SLAYED: High-End 4K Videos Featuring Beautiful Women Together
Watch unforgettable connections between stunning women in premium cinematic scenes. SLAYED delivers sensual all-female experiences and breathtaking 4K visuals...
https://www.evidentlyai.com/
Evidently AI - AI Evaluation & LLM Observability Platform
Ensure your AI is production-ready. Test LLMs and monitor performance across AI applications, RAG systems, and multi-agent workflows. Built on open-source.
ai evaluationllm observabilityplatform
https://langfuse.com/docs/evaluation/overview
Evaluation of LLM Applications - Langfuse
With Langfuse you can capture all your LLM evaluations in one place. You can combine a variety of different evaluation metrics like model-based evaluations...
llm applicationsevaluationlangfuse
https://agenta.ai/
Agenta - Prompt Management, Evaluation, and Observability for LLM apps
Agenta is an open-source platform for building robust LLM Application. It provides tools for prompt engineering, evaluation, debugging, and monitoring of...
prompt managementevaluationobservabilityllmapps
https://arize.com/
LLM Observability & Evaluation Platform
Unified LLM Observability and Agent Evaluation Platform for AI Applications—from development to production.
llm observabilityevaluationplatform
https://www.getplum.ai/
Plum AI - LLM Quality Evaluation & Improvement
Plum AI is a tool that evaluates and improves the quality of large-language model applications
plumaillmqualityevaluation
https://app.ragmetrics.ai/
RagMetrics | LLM Application Evaluation
RagMetrics helps LLM builders prove ROI and optimize performance through tailored, scalable and automated evaluations
llmapplicationevaluation
https://www.getplum.ai/about/
Plum AI - LLM Quality Evaluation & Improvement
Plum AI is a tool that evaluates and improves the quality of large-language model applications
plumaillmqualityevaluation
https://github.com/Giskard-AI/giskard-oss?locale=en-US
GitHub - Giskard-AI/giskard-oss: 🐢 Open-Source Evaluation & Testing library for LLM Agents · GitHub
open sourcegithubaiossevaluation
https://towardsdatascience.com/beware-of-unreliable-data-in-model-evaluation-a-llm-prompt-selection-case-study-with-flan-t5-88cfd469d058/
Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5 |...
Jan 8, 2025 - You may choose suboptimal prompts for your LLM (or make other suboptimal choices via model evaluation) unless you clean your test data
model evaluationcase studybewareunreliabledata