Cerebras Unveils Llama 3.1 405B with Blistering 969 Token/s Speed, First Token Takes just 240ms
Cerebras, the developer of specialized chips for running large-scale AI models, has unveiled the Cerebras Inference service. This service offers the Llama 3.1 405B model with full precision 16-bit, delivering...