benchmark leaderboard - Robuta Search

https://artificialanalysis.ai/evaluations/humanitys-last-exam Humanity's Last Exam Benchmark Leaderboard | Artificial Analysis Compare AI model performance on Humanity's Last Exam Benchmark Leaderboard. A frontier-level benchmark with 2,500 expert-vetted questions across mathematics,... last exam benchmark leaderboard artificial analysis humanity https://volumeshader.pro/leaderboard GPU Benchmark Leaderboard | Volume Shader gpu benchmark volume shader leaderboard https://artificialanalysis.ai/evaluations/apex-agents-aa APEX-Agents-AA Benchmark Leaderboard | Artificial Analysis Compare AI model performance on APEX-Agents-AA Benchmark Leaderboard. Artificial Analysis' implementation of the APEX-Agents benchmark, testing AI agents on... benchmark leaderboard artificial analysis apex agents aa https://artificialanalysis.ai/evaluations/gpqa-diamond GPQA Diamond Benchmark Leaderboard | Artificial Analysis Compare AI model performance on GPQA Diamond Benchmark Leaderboard. The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but... benchmark leaderboard artificial analysis gpqa diamond https://humanbenchmark.now/leaderboard Global Cognitive Performance Leaderboard — Human Benchmark All-time top scores across every Human Benchmark test — reaction time, memory, typing, aim trainer, and more. Updated live. See where you rank globally. cognitive performance human benchmark global leaderboard https://labs.scale.com/leaderboard/swe_bench_pro_public SWE-Bench Pro Leaderboard AI Coding Benchmark (Public Dataset) | Scale Apr 25, 2026 - Compare the resolve rates of GPT-5.4, Muse Spark, Claude Opus 4.6, and Gemini 3.1 Pro on SWE-Bench Pro. A rigorous AI software engineering benchmark for... swe bench pro ai coding leaderboard benchmark public https://www.idp-leaderboard.org/models/qwen3-5-9b Qwen3.5-9B Benchmark Results — IDP Leaderboard | IDP Leaderboard Qwen3.5-9B by Alibaba ranks #12 with 76.7% overall on the IDP Leaderboard. Detailed benchmark scores for OCR, table extraction, KIE, and VQA. qwen3 5 9b benchmark results idp leaderboard https://arena.ai/leaderboard Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models See how leading AI models stack up across text, image, vision, and more. This page provides a high-level snapshot of each Arena. Explore dedicated tabs for... frontier ai models arena leaderboard compare benchmark best https://arena.ai/leaderboard/ Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models See how leading AI models stack up across text, image, vision, and more. This page provides a high-level snapshot of each Arena. Explore dedicated tabs for... frontier ai models arena leaderboard compare benchmark best https://linqalpha.com/api LLM Investment Bias Leaderboard | Benchmark Financial LLM Models Discover which large language models show the strongest investment bias. LinqAlpha’s public leaderboard ranks GPT, Claude, Gemini, and others by bias index,... financial models llm investment bias leaderboard