Meta has released the Llama 3.2 model in two reduced sizes, 1B and 3B, quantized to make it compact for mobile devices and increase response speed.
Previously, Meta had the Llama 3.2 model in sizes 1B/3B, now with a 56% reduction in size (1B model reduced to about 1MB from the original 2.3MB), maintaining quality and security close to the original model. The reduced model size also decreases memory usage by 41% and increases speed by 2-4 times.
Meta has partnered with major mobile chip manufacturers MediaTek and Qualcomm to support this model on Arm CPUs and is working on NPU compatibility for future use.
Meta’s model size reduction technique offers two options: Quantization-Aware Training with LoRA adaptors (QLoRA) focusing on result accuracy and SpinQuant focusing on model size. Users can choose the method according to their preference. For more details on size reduction techniques, refer to the source.
Currently, Llama offers three model groups to choose from:
Llama 3.2: 1B & 3B โ Small, lightweight models with text-only content, in Quantized versions.
Llama 3.2: 11B & 90B โ Medium-sized models, multimodal supporting text and image.
Llama 3.1: 405B, 70B & 8B โ Top-tier large language models, with only version 3.1 available, 3.2 not yet released.
Source: Meta AI
TLDR: Meta introduces Llama 3.2 model in reduced sizes for better mobility and speed, partnering with chip manufacturers for support, and providing multiple model options for users to choose from.
Leave a Comment