NVIDIA has unveiled its next-generation graphics chip under the name Blackwell architecture, a corporate-level chip. The B200 chip houses up to 208,000 million transistors internally, with the actual chip consisting of two interlinked chips connected at a blazing speed of 10TB/s, similar to Apple Silicon in the Ultra series. The chip connects externally via NVLINK 5th generation to support external graphics chips with a bandwidth of 7.2TB/s.
Within the Blackwell processing unit, it supports new data types FP6 and FP4 specifically for model running, enabling FP4 models to process data up to 20,000 TFLOPS, while Hopper achieves 4,000 TFLOPS in FP8, and Blackwell processes 10,000 TFLOPS in FP8.
Some of the new features of Blackwell include:
Decompression Engine: A feature that compresses data when connected to the CPU, capable of switching compression processes including LZ4, Snappy, and DEFLATE.
RAS Engine: A system that checks the chip’s health status and notifies if any issues occur, allowing for troubleshooting during large model training to reduce system downtime when problems arise.
TEE-I/O: Encrypts data transmitted via NVLink without sacrificing operational efficiency.
The name Blackwell is inspired by David Blackwell, a mathematician known for probability theory, game theory, and dynamic programming. Currently, Blackwell is delivering both new server models and upgrade modules for existing clusters utilizing the H100 chip, with major cloud providers already integrating Blackwell for services.
TLDR: NVIDIA introduces the Blackwell architecture graphics chip with advanced features like the Decompression Engine and RAS Engine, named after mathematician David Blackwell, now available for server deployment and cluster upgrades.
Leave a Comment