Mistral AI, a French artificial intelligence company, has recently unveiled a new model called Mixtral 8x7B. This model leverages the architecture of mixture-of-experts (MoE), which combines outputs from sub-models within a large model that consists of a whopping 46.7 billion parameters. However, during actual execution, the model only utilizes 2 out of the 8 sub-models and selects the output tokens from these 2 chosen models. This efficient approach enables the model to run with processing power equivalent to a 12.9 billion parameter model.
By adopting the MoE approach, Mixtral achieves test scores that are comparable to GPT-3.5, despite having a smaller model size and lower computational requirements. In fact, it outperforms LLaMA 2 in multiple test sets, even when compared to a 70B model. The model, despite being open-source, Mistral AI plans to offer paid API services and is currently accepting sign-ups for API usage.
TLDR: Mistral AI, a leading AI company, introduces Mixtral 8x7B, a model that uses the mixture-of-experts architecture. With its efficient utilization of sub-models, Mixtral achieves high test scores and offers API services for a fee. Sign-ups for API usage are now open.
Leave a Comment