Home ยป Introducing Meta’s Llama 3.2 1B & 3B Quantized Models: Shrinking in Size for Mobile Deployment

Introducing Meta’s Llama 3.2 1B & 3B Quantized Models: Shrinking in Size for Mobile Deployment

Meta has released the Llama 3.2 model in two reduced sizes, 1B and 3B, quantized to make it compact for mobile devices and increase response speed.
Previously, Meta had the Llama 3.2 model in sizes 1B/3B, now with a 56% reduction in size (1B model reduced to about 1MB from the original 2.3MB), maintaining quality and security close to the original model. The reduced model size also decreases memory usage by 41% and increases speed by 2-4 times.
Meta has partnered with major mobile chip manufacturers MediaTek and Qualcomm to support this model on Arm CPUs and is working on NPU compatibility for future use.
Meta’s model size reduction technique offers two options: Quantization-Aware Training with LoRA adaptors (QLoRA) focusing on result accuracy and SpinQuant focusing on model size. Users can choose the method according to their preference. For more details on size reduction techniques, refer to the source.
Currently, Llama offers three model groups to choose from:
Llama 3.2: 1B & 3B โ€“ Small, lightweight models with text-only content, in Quantized versions.
Llama 3.2: 11B & 90B โ€“ Medium-sized models, multimodal supporting text and image.
Llama 3.1: 405B, 70B & 8B โ€“ Top-tier large language models, with only version 3.1 available, 3.2 not yet released.

Source: Meta AI

TLDR: Meta introduces Llama 3.2 model in reduced sizes for better mobility and speed, partnering with chip manufacturers for support, and providing multiple model options for users to choose from.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *