https://friendli.ai/models/kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5
kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5 - Fast, Reliable, and Scalable Inference...
Run kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5 with fast, reliable, and scalable inference on FriendliAI. Get low-latency performance with...
https://llm-stats.com/benchmarks/mmlu-chat
MMLU Chat Benchmark Leaderboard
May 12, 2026 - Chat-format variant of the Massive Multitask Language Understanding benchmark, evaluating language models across 57 tasks including elementary mathematics, US...
mmluchatbenchmarkleaderboard
https://llm-stats.com/benchmarks/mmlu
MMLU Benchmark Leaderboard
May 16, 2026 - Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional...
mmlubenchmarkleaderboard
https://www.datalearner.com/en/benchmarks
LLM Benchmark Library | MMLU, GSM8K, HumanEval and More | DataLearnerAI
Explore mainstream LLM evaluation benchmarks including AIME 2025, SWE Bench Verified, MMLU, MMLU Pro, GSM8K, HumanEval, MBPP, HellaSwag, ARC, TruthfulQA,...
llm benchmarkand morelibrarymmluhumaneval
https://www.oxen.ai/tasksource/mmlu/stargazers
Stargazers - tasksource/mmlu | Datasets at Oxen.ai
be the first to like tasksource/mmlu on Oxen.ai.
stargazersmmludatasetsoxenai
https://www.narev.ai/blog/gpt35-beats-gpt5
GPT-3.5 MMLU Score: why it beats GPT-5 at 3% of the Cost - Narev Docs
Despite its low MMLU score, GPT-3.5 outperformed top 2025 models like GPT-5 and Claude Opus on a real task, costing USD 823 vs USD 30,390 per 1M requests.