MyShell innovates LLM model on par with LLaMA2 at a fraction of the cost - only 3 million baht

MyShell, an artificial intelligence company, has introduced the JetMoE-8B model, which boasts higher efficiency than the LLaMA-2 13B model and at a significantly lower cost for training and running. JetMoE utilizes the Mixture-of-Expert architecture, allowing for real-time model usage with only 2.2B parameters. The running cost is on par with Gemma-2B while training the model requires 96 sets of NVIDIA H100 chips over a period of 2 weeks, totaling approximately 80,000 dollars or around 3 million baht. This cost is expected to be more economical compared to other models with similar performance. In contrast, training the LLaMA-2 13B model requires 368640 hours of A100 chips, which could exceed 500,000 dollars when calculated as cloud cost.

The model is available for use under the Apache 2.0 license and can be tested at Lepton.ai.

TLDR: MyShell introduces the JetMoE-8B model, which offers higher efficiency and lower training costs compared to similar models. The model utilizes the Mixture-of-Expert architecture and is available for testing at Lepton.ai.

MyShell innovates LLM model on par with LLaMA2 at a fraction of the cost – only 3 million baht

More Reading

[Whispers] AI Stability in Crisis: Financial Troubles, Inability to Pay GPU Costs, Employee Turnover

Confirmation from Apple on Dismissal of Autonomous Vehicle Development Employees and over 600 Staff for MicroLED Display Project

Leave a Comment

Leave a Reply Cancel reply

More Reading

Post navigation

Leave a Comment

Leave a Reply Cancel reply

The Transfiguration of Anthropic: AWS Fully Embraced as Amazon Duos Invests $4 Billion

Apple and US government agreement to develop secure AI after 15 tech companies lead the way.

API Price Reduction: Mistral Slashes Rates, Offering Up to 80% Off with Complimentary Package Upgrade