Groq, the chip maker accelerating AI processing, has teamed up with Meta to run the latest Llama 3 models at blazing speeds of up to 800 tokens per second. In February of this year, Groq showcased the ability to run Llama 2 70B models at a swift 240 tokens per second, a significant leap in performance compared to competitors like GPT-4 and Claude Opus, which struggle to reach 40 tokens per second.
While the figure of 800 tokens may not be official, external testing by Matt Shumer, CEO of HyperWrite AI, revealed that running tests through Meta’s API still falls short of the speed demonstrated on Groq’s hardware, indicating potential software challenges.
Official figures from Groq’s Llama 3 unveiling show the capability to process at 284 tokens per second, setting a high standard in the industry.
Founded by Jonathan Ross, a former member of the team behind Google’s initial TPU design, Groq’s chip, dubbed the Language Processing Unit (LPU), boasts superior speed and cost-effectiveness in processing compared to GPUs from NVIDIA.
Source: Groq, VentureBeat
“We’ve been testing against their API a bit and the service is definitely not as fast as the hardware demos have shown. Probably more a software problem- still excited for groq to be more widely used.” – Dan Jakaitis
TLDR: Groq collaborates with Meta to achieve unprecedented processing speeds, outperforming competitors and demonstrating the potential of their cutting-edge chip technology.
Leave a Comment