The Microsoft Research team has unveiled a new language model called BitNet, which is small enough to run on a CPU. While the LLM modeling scene has models like Llama with 1B and 3B parameters, BitNet comes in with 2B parameters and trained on a massive 4T (trillion tokens) dataset, then quantized to reduce its size.
One of BitNet’s key features is its use of 1-bit quantization technique (having 3 states: -1, 0, 1) right from the model training phase. This research aims to prove that a 1-bit LLM model, when trained properly, can yield results comparable to unquantized models.
Despite its small size, BitNet requires only 0.4GB of RAM, much less than the 2GB for Llama 3.2 1B or 1.4GB for Gemma 3 1B, while still delivering competitive results and outperforming them on some test sets, with a lower latency of 29ms compared to Llama’s 48ms.
Currently, BitNet remains a research project with published papers, exploring possibilities of further reducing model size to enable running on more hardware options.
TLDR: Microsoft Research introduces BitNet, a compact language model using 1-bit quantization technique, demonstrating comparable performance to larger unquantized models while requiring less RAM and offering faster response times.
Leave a Comment