Cerebras Demonstrates Llama 3.2 70B Model Running at 2,100 Tokens per Second, Surpassing GPYU by 16x

Cerebras, a company specialized in developing AI acceleration chips, claims that their chips perform faster than GPUs. They showcase the performance of running the Llama 3.2 model with a size of 70B, achieving an impressive response rate of 2,100 tokens per second, significantly higher than the previous round’s performance of 450 tokens per second. Cerebras emphasizes that this improved performance was achieved by running on the original Wafer Scale Engine 3 (WSE-3) chip but with extensive software customization.

Cerebras presents staggering statistics of 2,100 tokens per second, surpassing the capability of GPUs by 16 times. Moreover, when compared to cloud rental options, the performance is over 68 times higher.

In the AI acceleration chip industry, other competitors such as Groq and SambaNova have chips that rival Cerebras. These companies have also showcased their performance running Llama, prompting comparisons with Cerebras.

Source: Cerebras, The Next Platform

TLDR: Cerebras demonstrates superior AI acceleration chip performance, achieving remarkable token processing speeds compared to GPUs and cloud rental options. Competition in the industry remains fierce, with companies like Groq and SambaNova showcasing their capabilities as well.

Cerebras Demonstrates Llama 3.2 70B Model Running at 2,100 Tokens per Second, Surpassing GPYU by 16x

More Reading

Singapore Greenlights Subsea Cable Project to Transmit Solar Power Across 4,300 km from Australia.

Unconfirmed: Google to Launch AI Assistant on Chrome in December under Project Name Jarvis.

Leave a Comment

Leave a Reply Cancel reply

More Reading

Post navigation

Leave a Comment

Leave a Reply Cancel reply