mmlu pro - Robuta Search

https://arxiv.org/abs/2406.01574?utm_campaign=The%20Batch&utm_medium=email&_hsenc=p2ANqtz-_J-a1Me5M2qJj42YrpCiOpKZXedgkjMMd7JgRqZbTftAbo3LwzduWZ6BwqVyP0HweLkkZOftczGDWEwHqfOgX11pD_uNNLO8A4j70tw71f-jRecho&_hsmi=353823758&utm_content=353823758&utm_source=hs_email

[2406.01574] MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Abstract page for arXiv paper 2406.01574: MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

mmlu pro robust challenging multi

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/discussions/332

meta-llama/Llama-3.1-8B-Instruct · Add community evaluation results for GPQA, MMLU-PRO, GSM8K

This PR adds community-provided evaluation results for the following benchmarks:

meta llama add community instruct