Google has released the Gemma 3 artificial intelligence (AI) model in a shortened version known as Quantization Aware Training (QAT). This model has been trained by compressing the Q4_O model small enough to run Gemma 3 27B on a graphics card with 14.1GB of RAM.
The QAT model relies on the full BF16 model as a base and trains the model that is currently being compressed to mimic itself after being compressed. This training process is repeated approximately 5,000 times, resulting in a final model that is compressed, but slightly lower in quality compared to the original model.
Gemma 3 QAT supports various frameworks like Ollama, LM Studio, MLX, Gemma.cpp, and llama.cpp. The model comes in 4 versions equivalent to the full Gemma 3, with the smallest model size being just 0.5GB, making it suitable for running on mobile phones.
TLDR: Google introduced the Gemma 3 AI model in a shortened version called QAT, compressed from the Q4_O model, with support for various frameworks and mobile phone compatibility.
Leave a Comment