Robuta

https://friendli.ai/models/kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5 kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5 - Fast, Reliable, and Scalable Inference... Run kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5 with fast, reliable, and scalable inference on FriendliAI. Get low-latency performance with... https://llm-stats.com/benchmarks/mmlu-chat MMLU Chat Benchmark Leaderboard May 12, 2026 - Chat-format variant of the Massive Multitask Language Understanding benchmark, evaluating language models across 57 tasks including elementary mathematics, US... mmluchatbenchmarkleaderboard https://llm-stats.com/benchmarks/mmlu MMLU Benchmark Leaderboard May 16, 2026 - Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional... mmlubenchmarkleaderboard https://www.datalearner.com/en/benchmarks LLM Benchmark Library | MMLU, GSM8K, HumanEval and More | DataLearnerAI Explore mainstream LLM evaluation benchmarks including AIME 2025, SWE Bench Verified, MMLU, MMLU Pro, GSM8K, HumanEval, MBPP, HellaSwag, ARC, TruthfulQA,... llm benchmarkand morelibrarymmluhumaneval https://www.oxen.ai/tasksource/mmlu/stargazers Stargazers - tasksource/mmlu | Datasets at Oxen.ai be the first to like tasksource/mmlu on Oxen.ai. stargazersmmludatasetsoxenai https://www.narev.ai/blog/gpt35-beats-gpt5 GPT-3.5 MMLU Score: why it beats GPT-5 at 3% of the Cost - Narev Docs Despite its low MMLU score, GPT-3.5 outperformed top 2025 models like GPT-5 and Claude Opus on a real task, costing USD 823 vs USD 30,390 per 1M requests.