“Prices familiar with NVIDIA soaring, NVIDIA stocks up 200%, NVIDIA’s value surpasses Apple’s headline like this may have been prevalent in the past 1-2 years after the surge in AI trend.
Many might already know that the factor that propelled NVIDIA to this level is its dominance in the GPU market, used extensively in AI training. The demand is so high that big IT companies or research firms looking for NVIDIA GPUs have to wait in queue for months, even years.
But the question remains: what makes NVIDIA’s graphics cards the primary (and perhaps the only) choice for companies wanting to train AI, even though rivals like AMD have large GPUs as well? Is it because NVIDIA’s GPUs are superior? This article will delve deeper into that reasoning.
Why are GPUs used for AI training?
To explain simply, one must understand the basics of GPU computing. GPUs are designed to support parallel computing, where multiple tasks can be processed simultaneously, thanks to a multitude of processing cores, numbering in the thousands nowadays.
An example of a task that can be parallel processed is graphic image processing on a computer. GPUs receive a single command to process a frame from the CPU, but must process various data sets simultaneously, such as different 3D polygons in the frame, light reflections on each polygon, and how light reflects from the polygons, to render each pixel on the screen.
Comparison to CPUs, which are designed to process tasks sequentially, even with efforts to enhance processing capabilities like Intel’s Hyper-Threading or increasing core counts in new CPUs, they still lag behind in parallel processing compared to GPUs.
While training AI models, whether it be Machine Learning or Deep Learning, involves parallel processing. Thus, GPUs are extensively used.
CUDA: The turning point that made NVIDIA dominate AI
NVIDIA’s journey began in 1993 when Jensen Huang, the founder and CEO, saw the opportunity in the burgeoning challenge in processing engineering and founded the company to develop GPUs, which gained popularity steadily over the years.
Initially, graphic tasks were the only workload suitable for GPUs, but the advent of General-Purpose Graphics Processing Units (GPGPU) changed the game. NVIDIA released its CUDA software in 2006, allowing developers to directly manage and process GPU workloads efficiently. The aim was not just to empower game developers but also to enhance the performance of other types of applications that can leverage the GPU’s processing power.
As NVIDIA owned both the hardware and software platforms, their collaboration was seamless. Even though there were open-source alternatives like OpenCL available, the benefits of owning the platform made NVIDIA more competitive and less supportive of OpenCL, thus driving CUDA’s popularity.
Another pivotal moment that solidified CUDA as the AI standard was the research paper “Large-scale Deep Unsupervised Learning using Graphics Processors” in 2009, which demonstrated the cost-effective use of NVIDIA GeForce GTX 280 for accelerating Deep Learning, setting the stage for high-performance AI training.
As NVIDIA’s platform, both hardware and software, grew stronger with more AI research, researchers opted to use NVIDIA’s platform for AI training. NVIDIA further developed CUDA to cater to this demand, releasing libraries like cuDNN for Deep Learning and cuBLAS for linear algebra calculations.
While popular Deep Learning frameworks like TensorFlow and PyTorch support CUDA efficiently, they further solidified NVIDIA/CUDA’s stronghold in the AI chip market. NVIDIA collaborated with researchers and organizations like U Berkeley and Meta to optimize AI models to run efficiently on CUDA.
Climbing the CUDA mountain
In the industry, efforts are being made to challenge NVIDIA’s dominance. AMD introduced the ROCm framework to replace CUDA, Intel partnered with Arm, Google, Samsung, Qualcomm to develop oneAPI, or initiatives like OpenAI’s Triton project for easier-to-use AI training languages than CUDA, aiming to shake NVIDIA’s grip.
The challenge with these new frameworks or languages is the chicken-and-egg problem: the user base is still limited because tools or support systems are not as robust as CUDA, lacking the strong ecosystem that NVIDIA has built over the years. As the cost of switching is high, and investment in developing tools with limited user base is scarce, many developers, organizations, or researchers hesitate to switch software or languages for AI training.
While the AI world sees new technologies and models every day, the shared codes among researchers mostly support CUDA as the primary platform. Using other technologies may support some projects partly but face challenges with many others. With the industry evolving rapidly, everyone is pressured to follow the most supported technology to keep up.
The best person to summarize this industry’s overview might be Raja Koduri, the former Chief Architect at both Intel and Radeon (AMD’s graphics division) who has battled against NVIDIA throughout his career. Reflecting on his experience, he shared that he was once pressured by his team to switch from using GeForce to other GPU cards, showcasing that the cost of time and engineering power is more valuable than the price of GPUs, no matter their model.
Raja states one reason for NVIDIA’s market dominance is their products covering both gaming and data center sides, with architectures and stacks in unison. Easy access to globally marketed gaming GPUs, affordably priced, serves as a gateway for small developers to purchase and code on NVIDIA’s platform easily.
These are reflections behind the scenes of NVIDIA’s soaring stock prices and its rise above tech giants like Microsoft, Apple, Amazon in becoming the world’s highest-valued tech company.”
TLDR: NVIDIA’s dominance in the AI market is fueled by their GPUs’ superior technology, the CUDA software platform, and a robust ecosystem that supports AI training efficiently. Despite challenges from competitors, NVIDIA’s stronghold remains unshaken due to its technological advancements and support from the research community.
Leave a Comment