Individuals in the AI field may be familiar with the company Hugging Face, known as a repository of large-scale models (with over 1 million models). The previous revenue model of Hugging Face involved renting servers for training-customizing-running these models instantly.
Now, Hugging Face is expanding its model business by offering software used to run the aforementioned models for customers to rent and utilize on their own IT systems. This software is called Hugging Face Generative AI Services or simply HUGS.
HUGS is a microservice software that runs AI models on various hardware servers, including on multiple GPU brands and on specialized AI accelerators. Moreover, it optimizes model efficiency on these servers effectively (as Hugging Face needs to serve a large number of models), thereby saving organizational clients’ computational resources needed to run models.
Examples of models run on HUGS focus on open-source models such as Llama 3.1, Mixtral, Gemma 2, Qwen 2.5, among others.
Hugging Face mentions that HUGS is suitable for organizations needing to utilize AI models on their own server systems (whether on-cloud or on-premises) and lacking sufficient human expertise to optimize running efficiency. The software also runs on standard software like Kubernetes and integrates with OpenAI API, facilitating the migration of models to HUGS.
Currently, HUGS is available on major cloud platforms like AWS, Google Cloud, Microsoft Azure, DigitalOcean, while on-premises usage can be discussed directly with Hugging Face’s sales team. The closest competitor to HUGS is likely NVIDIA NIM, acting as a middleware software between models and hardware processing, although NIM only runs on the NVIDIA CUDA platform while HUGS supports AMD GPUs and plans to support other AI accelerators like AWS Inferentia and Google TPU in the future.
TLDR: Hugging Face provides HUGS, a software service for renting and running AI models efficiently on various hardware configurations, catering to organizations lacking in-house expertise for model optimization and offering compatibility with a wide range of AI accelerators.
Leave a Comment