Home » cerebras-inference-service

cerebras-inference-service

Cerebras Unveils Llama 3.1 405B with Blistering 969 Token/s Speed, First Token Takes just 240ms

Posted byby
1 year ago
0 Comments

Cerebras, the developer of specialized chips for running large-scale AI models, has unveiled the Cerebras Inference service. This service offers the Llama 3.1 405B model with full precision 16-bit, delivering...

Cerebras Launches Llama 3.1 Cloud Service with Blistering Speeds Exceeding 1,800 Tokens per Second, Packing RAM into the Chip

Posted byby
1 year ago
0 Comments

Cerebras, a leading AI chip manufacturing company, has launched the Cerebras Inference service running the Llama 3.1 model at high speeds. The Llama 3.1 70B model can achieve a remarkable...