Google Cloud has introduced the new version of its artificial intelligence processor chip, Cloud TPU v5e, which is an upgrade from TPU v4 that has been in use since 2020. Although there are not many details available about TPU v5e, it is a custom version (denoted by the e at the end) that focuses on cost-efficiency compared to the regular TPU. Google’s numbers indicate that it has better performance per dollar for training LLM models than TPU v4, with a maximum improvement of about 2 times, and it also has better performance per dollar for running models, with a maximum improvement of 2.5 times.
In addition to cost-efficiency, renting TPU v5e on Google Cloud for AI work also supports running on a multi-thousand chip scale, surpassing the limitations of TPU v4, which can run up to a maximum of 3,072 chips (slice vs multislice). This is a technique that Google has used to train PaLM models before, and now it is available for external customers to rent and use.
TPU v5 has also been optimized for popular frameworks such as JAX, PyTorch, TensorFlow, and even supports PyTorch/XLA 2.1 version.
For those who still need to run jobs with traditional GPUs, Google also offers the new A3 VM, which features the latest official NVIDIA H100 GPUs as part of its General Availability (GA) after testing since March.
A3 VM utilizes two Xeon Scalable Gen 4 CPUs, eight NVIDIA H100 GPUs, and 2TB of memory. A real-world example of customers who have used it and found it to be faster than using the previous generation NVIDIA A100 GPUs is Midjourney, who reported a performance improvement of up to 2 times.