Home ยป Returning to the Era of Batch Computing: Google Cloud Introduces AI Training Chips with Queuing System and Advanced Reservation

Returning to the Era of Batch Computing: Google Cloud Introduces AI Training Chips with Queuing System and Advanced Reservation

Google Cloud has introduced the Dynamic Workload Scheduler to address a problem faced by customers who cannot request to use graphic chips or TPUs for training artificial intelligence models. This is due to a shortage of available chips. The service operates in two modes.

In Flex Start mode, customers have to notify the system about the number of chips required and the duration of usage. The system will then queue the request until the desired number of chips becomes available and initiate the job. This mode is suitable for short experiments or fine-tuning tasks that can be completed in just a few minutes up to a maximum of 7 days.

On the other hand, the Calendar mode is a pre-booking system for clusters. It requires a usage notice of 7-14 days in advance and allows booking up to 8 weeks ahead.

This approach is reminiscent of the batch processing system in 1960s computers, when they were very expensive and did not have the parallel processing capabilities we have today. Users had to submit their jobs on punched cards or magnetic tapes in advance, and when it was their turn, the system would load and execute the jobs, printing the results. Similarly, the issue of insufficient computational resources for training AI models arises today, leading to a return to this old approach.

TLDR: Google Cloud has launched the Dynamic Workload Scheduler to solve the problem of limited access to graphic chips or TPUs for AI training. The system offers two modes, Flex Start and Calendar, with Flex Start allowing short experiments for up to 7 days and Calendar enabling pre-booking of clusters for 8 weeks. This system resembles the batch processing approach used in early computers and addresses the scarcity of computational resources for AI training.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Memphis Supercluster Unveiled by Elon for xAI’s Supercomputer, Now Operational with 100,000 GPUs.

Unveiling Imagen 2 by Google Cloud: Amplifying Visuals through Textual Transmutation on the Advanced Vertex AI Platform

Shutterstock Closes Additional Deals, Selling Licenses for Trend Models to Reka, an AI Company