Cerebras Demonstrates Llama 3.2 70B Model Running at 2,100 Tokens per Second, Surpassing GPYU by 16x
Cerebras, a company specialized in developing AI acceleration chips, claims that their chips perform faster than GPUs. They showcase the performance of running the Llama 3.2 model with a size...