https://eugeneyan.com/writing/llm-evaluators/
Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
evaluatingeffectivenessllmaka
https://aitech365.com/machine-learning/databricks-introduces-advanced-llm-judge-capabilities-to-elevate-accuracy-for-ai-agents/
Databricks Introduces Advanced LLM-Judge Capabilities
Nov 13, 2025 - Databricks announced an enhancement to its evaluation framework for AI agents, introducing 3 new capabilities within its MLflow environment.
introduces advancedllm judge
https://towardsdatascience.com/llm-as-a-judge-what-it-is-why-it-works-and-how-to-use-it-to-evaluate-ai-models/
LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models | Towards Data...
Nov 24, 2025 - A step-by-step guide to building AI quality control using large language models
llmjudgeworks
https://qiita.com/ssc-dninomiya/items/dbb960ac33a17cd1f4c9
自作MCP×AI Agentの回答精度をClaude Codeで評価させてみた[LLM-as-a-Judge] #Python - Qiita
Mar 26, 2026 - 背景 最近、業務データとLLMを繋ぐためにMCPサーバーを自作していました。ツールを実装しながら気になったのが、「ユーザーがAI Agentに投げる質問に対して、MCPツール経由で正しい回答を返せるのか」という点です。 そこでLLM-as-a-Judge(LLMの出力を別...
python qiitallmjudge
Sponsored https://www.cheekycrush.com/
CheekyCrush
https://www.evidentlyai.com/llm-guide/llm-as-a-judge
LLM-as-a-judge: a complete guide to using LLMs for evaluations
LLM-as-a-judge is a common technique to evaluate LLM-powered products. In this guide, we’ll cover how it works, how to build an LLM evaluator and craft good...
complete guideusing llmsjudge
https://www.infoq.com/podcasts/llm-based-application-evaluation/?topicPageSponsorship=6cd7463a-8078-4002-8497-4a5e67bd0650
Elena Samuylova on Large Language Model (LLM)-Based Application Evaluation and LLM as a Judge -...
In this podcast, InfoQ spoke with Elena Samuylova from Evidently AI, on best practices in evaluating Large Language Model (LLM)-based applications. She also...
large language modelllm based
https://galileo.ai/mastering-llm-as-a-judge
Mastering LLM as a Judge eBook: Improve AI Evaluations at Scale
Learn how to use LLM-as-a-Judge to accelerate AI evaluations, cut costs, and improve accuracy across complex AI workflows.
mastering llmjudgeebookai
https://docs.dbnl.com/workflow/metrics/llm-as-judge-metric-templates
LLM-as-Judge Metric Templates | Distributional
Pre-built templates to customize LLM-as-judge Metrics
llmjudgemetrictemplates
https://sambanova.ai/blog/llm-judge-for-multilingual-document-question-answering
LLM-Judge for Multilingual Document Question Answering
To understand the needs of our customers, we evaluated frontier closed- and open-sourced VLMs on the Japanese version of this task with JDocQA, a dataset...
llm judgequestion answering