Robuta

https://www.vals.ai/benchmarks/gpqa GPQA Diamond Private, domain-specific benchmarks in legal, tax, and finance. gpqa diamond https://artificialanalysis.ai/evaluations/gpqa-diamond GPQA Diamond Benchmark Leaderboard | Artificial Analysis Compare AI model performance on GPQA Diamond Benchmark Leaderboard. The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but... gpqa diamondartificial analysisbenchmarkleaderboard https://intuitionlabs.ai/articles/gpqa-diamond-ai-benchmark GPQA-Diamond Benchmark: Scores, Leaderboard & How AI Models Compare | IntuitionLabs Mar 3, 2026 - GPQA-Diamond scores updated through 2026: Gemini 3.1 Pro (94.1%), GPT-5.2, Claude Opus 4.6, Aristotle-X1, and more. See which AI models beat PhD experts on 198... gpqa diamondhow aimodels comparebenchmarkscores https://huggingface.co/moonshotai/Kimi-K2.6/blob/main/.eval_results/gpqa_diamond.yaml .eval_results/gpqa_diamond.yaml · moonshotai/Kimi-K2.6 at main We’re on a journey to advance and democratize artificial intelligence through open source and open science. eval resultsgpqa diamondkimi k2yamlmoonshotai