Home ยป Revolutionary Meta Showcase Unveils Latest Customizer for Training LLaMA 3 Chips H100 with 24,576 Units – Purchase Continuously at Year End 350,000 Units.

Revolutionary Meta Showcase Unveils Latest Customizer for Training LLaMA 3 Chips H100 with 24,576 Units – Purchase Continuously at Year End 350,000 Units.

Meta reports on data from new clusters that the company uses for artificial intelligence training. This is specifically done to design and train LLaMA 3, and serves as a testbed for new cluster architectures that will expand in the future. The plan is to continuously purchase additional chips until the end of the year, reaching around 350,000 units of the H100 chip, with combined processing power equivalent to 600,000 units of the H100 chip.

The clusters consist of two main sets that differ in the network system that supports cross-device memory access. The first set uses remote direct memory access (RDMA) over converged Ethernet (RoCE) with Arista 7800 network and Wedge400, while the other set utilizes NVIDIA Quantum2 InfiniBand. Both sets have a 400Gbps bandwidth connection and currently operate effectively.

The servers are equipped with Meta-designed Grand Teton machines for AI tasks, with Flash storage system mounted into Linux using Meta’s Tectonic storage system.

The challenge of building such large clusters lies in the communication system, as bottlenecks can quickly arise. The team must optimize both software and network infrastructure to achieve performance close to 100% of what was previously possible in smaller clusters.

TLDR: Meta reports on utilizing new clusters for AI training, aiming to expand the architecture and processing power with plans to acquire more H100 chips. The clusters feature different networking systems and custom-designed servers for AI tasks, with a focus on optimizing communication for maximum efficiency.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Government of the United States Alots Funds to Propel Open RAN Network Equipment Development, Enabling Interoperability Among Multiple Carriers

Negotiating Data Exchange with AI-driven Trends: Reddit Strikes a $60 Million Deal Annually

Brazilian Data Protection Commission Bans Meta from Using User Data for AI Training