MyShell, an artificial intelligence company, has introduced the JetMoE-8B model, which boasts higher efficiency than the LLaMA-2 13B model and at a significantly lower cost for training and running. JetMoE utilizes the Mixture-of-Expert architecture, allowing for real-time model usage with only 2.2B parameters. The running cost is on par with Gemma-2B while training the model requires 96 sets of NVIDIA H100 chips over a period of 2 weeks, totaling approximately 80,000 dollars or around 3 million baht. This cost is expected to be more economical compared to other models with similar performance. In contrast, training the LLaMA-2 13B model requires 368640 hours of A100 chips, which could exceed 500,000 dollars when calculated as cloud cost.
The model is available for use under the Apache 2.0 license and can be tested at Lepton.ai.
TLDR: MyShell introduces the JetMoE-8B model, which offers higher efficiency and lower training costs compared to similar models. The model utilizes the Mixture-of-Expert architecture and is available for testing at Lepton.ai.
Leave a Comment