Blackwell B200 Unveils Inaugural Benchmarks for NVIDIA's Llama 2 Run Model, Boasting 4x Improvement Over H100

NVIDIA has unveiled the first benchmark results of the Blackwell B200 GPU, a popular choice for AI datacenters, tested with the Llama 2 70B model. The performance of this GPU surpasses the H100 (Hopper) chip by approximately 4 times.

The results from NVIDIA’s testing using the Xeon Silver 4410Y CPU alongside the B200 chip with 180GB of RAM are as follows:

Offline mode (sending all sample data to the server at once) achieved 11,264 token/s (3.7x increase compared to H100).
Server mode (sending sample data to the server sequentially, mimicking real-world usage) achieved 10,756 token/s (4x increase compared to H100).

NVIDIA attributes the improved performance to the Blackwell FP4 Transformer Engine feature, which converts models into FP4 data type before running them, speeding up model execution significantly as GPUs have dedicated FP4 processing engines.

NVIDIA also showcased the results of running MLPerf on the current top-tier H200 GPU, which utilizes high-speed HBM3e memory with a 1.4x increase in memory bandwidth compared to H100. Running the Llama 2 70B model in server mode using 8 H200 GPUs yielded 32,790 token/s or approximately 4,098 token/s per GPU.

TLDR: NVIDIA’s Blackwell B200 GPU shows impressive performance gains compared to the H100, thanks to the Blackwell FP4 Transformer Engine feature. The H200 GPU also demonstrates increased memory bandwidth and overall efficiency in running AI models.

Blackwell B200 Unveils Inaugural Benchmarks for NVIDIA’s Llama 2 Run Model, Boasting 4x Improvement Over H100

More Reading

Streamlined updates now allow simultaneous updating of multiple applications on the Play Store.

European Commission to Investigate Telegram for Potentially Underreporting User Numbers to Avoid Regulatory Oversight

Leave a Comment

Leave a Reply Cancel reply

More Reading

Post navigation

Leave a Comment

Leave a Reply Cancel reply