https://www.vals.ai/benchmarks/gpqa
GPQA Diamond
Private, domain-specific benchmarks in legal, tax, and finance.
gpqa diamond
https://artificialanalysis.ai/evaluations/gpqa-diamond
GPQA Diamond Benchmark Leaderboard | Artificial Analysis
Compare AI model performance on GPQA Diamond Benchmark Leaderboard. The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but...
gpqa diamondartificial analysisbenchmarkleaderboard
https://intuitionlabs.ai/articles/gpqa-diamond-ai-benchmark
GPQA-Diamond Benchmark: Scores, Leaderboard & How AI Models Compare | IntuitionLabs
Mar 3, 2026 - GPQA-Diamond scores updated through 2026: Gemini 3.1 Pro (94.1%), GPT-5.2, Claude Opus 4.6, Aristotle-X1, and more. See which AI models beat PhD experts on 198...
gpqa diamondhow aimodels comparebenchmarkscores
https://huggingface.co/moonshotai/Kimi-K2.6/blob/main/.eval_results/gpqa_diamond.yaml
.eval_results/gpqa_diamond.yaml · moonshotai/Kimi-K2.6 at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
eval resultsgpqa diamondkimi k2yamlmoonshotai