Cloudflare has adjusted its Workers AI service for running artificial intelligence models into a generally available (GA) state, and will begin charging for 10 models that enter GA status, but not for models that are still in beta.
The distinctive feature of Workers AI compared to other services is that Cloudflare has data centers with GPU chips installed globally, including in Bangkok. With this GA rollout, the company will automatically distribute workloads across cities in cases where some cities’ GPU chips are full, allowing developers overall more freedom to use quotas.
Cloudflare’s pricing model is based on a unit called Neurons, which calculates actual GPU chip usage. They provide a daily free quota of 10,000 Neurons in this rollout, allowing for asking 100-200 questions with LLM, translating 500 messages, converting speech to text for 500 seconds, categorizing messages 10,000 times. Users can track their Neuron usage on the calculation web page.
Another key feature is the use of LoRA adapter to convert models into fine-tuned models, supporting Gemma 2B/7B, LLaMA 2 7B, and Mistral 7B. LoRA files must not exceed 100MB, LoRA ranks up to 8, and can support up to 30 models per account. In the future, direct fine-tuning in Cloudflare will also be supported.
Source – Cloudflare
TLDR: Cloudflare has updated its Workers AI service to GA status, introducing Neurons as a unit for pricing and adding a LoRA adapter for model conversion.
Leave a Comment