Hugging Face has released the second edition of the LLM leaderboard, testing the open large language model in various tasks.
The testing conducted by Hugging Face focused on four tasks: knowledge retrieval, reasoning from long-form content, complex mathematical calculations, and process explanation using 6 metrics including MMLU-Pro, GPQA, MuSR, MATH, IFEval, and BBH.
The overall results of the testing placed Qwen2-72B-Instruct model from Alibaba’s Qwen in the top spot, followed by Meta-Llama-3-70B-Instruct from Meta in second place, and Qwen/Qwen2-72B from Qwen in third place. Additionally, Qwen secured the 10th and 11th positions with Qwen/Qwen1.5-110B and Qwen/Qwen1.5-110B-Chat.
Notably, OpenAI’s ChatGPT was not included in this testing as it is a closed LLM model, preventing replication of the testing results according to Hugging Face.
Clem Delangue, CEO of Hugging Face, shared additional information about the testing, revealing that it utilized 300 NVIDIA H100 GPUs for processing. He mentioned that future tests are expected to be more challenging due to the complexity and size of the models, emphasizing that larger models with more parameters do not always equate to higher intelligence.
Source: Hugging Face
TLDR: Hugging Face released the LLM leaderboard rankings, with Qwen2-72B-Instruct leading the pack, emphasizing the importance of complexity and model size in AI testing.
Leave a Comment