Home ยป Revolutionary Meta Showcase Unveils Latest Customizer for Training LLaMA 3 Chips H100 with 24,576 Units – Purchase Continuously at Year End 350,000 Units.

Revolutionary Meta Showcase Unveils Latest Customizer for Training LLaMA 3 Chips H100 with 24,576 Units – Purchase Continuously at Year End 350,000 Units.

Meta reports on data from new clusters that the company uses for artificial intelligence training. This is specifically done to design and train LLaMA 3, and serves as a testbed for new cluster architectures that will expand in the future. The plan is to continuously purchase additional chips until the end of the year, reaching around 350,000 units of the H100 chip, with combined processing power equivalent to 600,000 units of the H100 chip.

The clusters consist of two main sets that differ in the network system that supports cross-device memory access. The first set uses remote direct memory access (RDMA) over converged Ethernet (RoCE) with Arista 7800 network and Wedge400, while the other set utilizes NVIDIA Quantum2 InfiniBand. Both sets have a 400Gbps bandwidth connection and currently operate effectively.

The servers are equipped with Meta-designed Grand Teton machines for AI tasks, with Flash storage system mounted into Linux using Meta’s Tectonic storage system.

The challenge of building such large clusters lies in the communication system, as bottlenecks can quickly arise. The team must optimize both software and network infrastructure to achieve performance close to 100% of what was previously possible in smaller clusters.

TLDR: Meta reports on utilizing new clusters for AI training, aiming to expand the architecture and processing power with plans to acquire more H100 chips. The clusters feature different networking systems and custom-designed servers for AI tasks, with a focus on optimizing communication for maximum efficiency.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Google prepares to enhance Gemini Business and Gemini Enterprise for Google Workspace customers.

Violation Alert: OpenAI’s Use of Video Content on Platform to Train Sora, as Stated by CEO on YouTube

Game of Thrones Author and Fiction Writers Consortium Accuse OpenAI of Copyright Infringement in AI Training Lawsuit