Robuta

Sponsor of the Day: Jerkmate
https://artificialanalysis.ai/evaluations/humanitys-last-exam Humanity's Last Exam Benchmark Leaderboard | Artificial Analysis Compare AI model performance on Humanity's Last Exam Benchmark Leaderboard. A frontier-level benchmark with 2,500 expert-vetted questions across mathematics,... last exambenchmark leaderboardartificial analysishumanity https://volumeshader.pro/leaderboard GPU Benchmark Leaderboard | Volume Shader gpu benchmarkvolume shaderleaderboard https://artificialanalysis.ai/evaluations/apex-agents-aa APEX-Agents-AA Benchmark Leaderboard | Artificial Analysis Compare AI model performance on APEX-Agents-AA Benchmark Leaderboard. Artificial Analysis' implementation of the APEX-Agents benchmark, testing AI agents on... benchmark leaderboardartificial analysisapexagentsaa https://artificialanalysis.ai/evaluations/gpqa-diamond GPQA Diamond Benchmark Leaderboard | Artificial Analysis Compare AI model performance on GPQA Diamond Benchmark Leaderboard. The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but... benchmark leaderboardartificial analysisgpqadiamond https://humanbenchmark.now/leaderboard Global Cognitive Performance Leaderboard — Human Benchmark All-time top scores across every Human Benchmark test — reaction time, memory, typing, aim trainer, and more. Updated live. See where you rank globally. cognitive performancehuman benchmarkgloballeaderboard https://labs.scale.com/leaderboard/swe_bench_pro_public SWE-Bench Pro Leaderboard AI Coding Benchmark (Public Dataset) | Scale Apr 25, 2026 - Compare the resolve rates of GPT-5.4, Muse Spark, Claude Opus 4.6, and Gemini 3.1 Pro on SWE-Bench Pro. A rigorous AI software engineering benchmark for... swe bench proai codingleaderboardbenchmarkpublic https://www.idp-leaderboard.org/models/qwen3-5-9b Qwen3.5-9B Benchmark Results — IDP Leaderboard | IDP Leaderboard Qwen3.5-9B by Alibaba ranks #12 with 76.7% overall on the IDP Leaderboard. Detailed benchmark scores for OCR, table extraction, KIE, and VQA. qwen3 5 9bbenchmark resultsidp leaderboard https://arena.ai/leaderboard Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models See how leading AI models stack up across text, image, vision, and more. This page provides a high-level snapshot of each Arena. Explore dedicated tabs for... frontier ai modelsarena leaderboardcomparebenchmarkbest https://arena.ai/leaderboard/ Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models See how leading AI models stack up across text, image, vision, and more. This page provides a high-level snapshot of each Arena. Explore dedicated tabs for... frontier ai modelsarena leaderboardcomparebenchmarkbest https://linqalpha.com/api LLM Investment Bias Leaderboard | Benchmark Financial LLM Models Discover which large language models show the strongest investment bias. LinqAlpha’s public leaderboard ranks GPT, Claude, Gemini, and others by bias index,... financial modelsllminvestmentbiasleaderboard