Meta has released the Llama 3 artificial intelligence models with two sub-models, namely 8B and 70B, designed for specific commands, while the 400B model is still in development.
In comparison, the 8B model by Meta outperforms Gemma 7B and Mistral 7B Instruct in all test sets, such as HumanEval for programming and GSM-8K for calculations.
As for the 70B model, Meta compares it with Gemini Pro 1.5, achieving higher scores in some test sets and outperforming Claude 3 Sonnet across all test sets.
The final assessment of Llama 3 relies on specialized question sets, including 1,800 questions that the development team has never seen before. The models are evaluated based on the responses provided, with Llama 3 70B clearly outperforming Claude Sonnet, Mistral Medium, GPT-3.5, and Llama 2.
The internal architecture has been updated with a new tokenizer of size 128K and training with sequences of 8,192 tokens, a dataset that is 7 times larger than Llama 2, totaling 15T with data in 5% of other languages, making it less efficient in those languages. The training of both models took 7.7 million GPU hours, emitting 2,290 tons of carbon dioxide.
The Llama 3 development team has prepared for fine-tuning from the start, with the torchtune program supporting Llama 3 and the customizable Llama Guard 2 model to protect against dangerous prompts.
Meta is planning to offer Llama 3 as a service via the web, but this service is not yet available in Thailand.
TLDR: Meta introduced Llama 3 AI models 8B and 70B, excelling in various test sets, with improvements in architecture and training methods, preparing for fine-tuning and offering as a web service in the future.
Leave a Comment