Google has released the PaliGemma 2 model, an enhanced version of the LLM multimodal artificial intelligence model that was first unveiled at this year’s Google I/O event. This model comes in various sizes, offers more detailed image descriptions, and expands its capabilities with new features.
There are three sizes of the model available: 3B, 10B, and 28B, all supporting image sizes of 224×224, 448×448, and 896×896. In total, there are 9 models with diverse abilities ranging from basic image captioning to specialized document reading tasks like financial tables, notes, music sheets, and even X-ray images.
PaliGemma can be used for tasks such as document reading, object detection, and other applications that combine text and images. The model is free to use under Gemma’s terms and supports HuggingFace Transformer, Keras, PyTorch, JAX, and Gemma.cpp.
TLDR: Google introduces the PaliGemma 2 model, an upgraded LLM AI model with enhanced image capabilities and new features, available in multiple sizes and supporting various tasks like document reading and object detection.
Leave a Comment