Alibaba Cloud’s Qwen team has unveiled two new artificial intelligence models: Qwen2.5 Omni and QVQ-Max. Qwen2.5 Omni is a multimodal model that can interpret images, sound, and videos, while also capable of responding in voice or text. On the other hand, QVQ-Max is a model that thinks before answering, based on visual input, enabling it to comprehend complex documents.
Qwen2.5 Omni is a multimodal model that processes text, sound, and images, allowing it to analyze time-aligned multimodal data for better video understanding. It can generate verbal responses while its architecture supports a “listen and answer” approach, allowing it to answer before input completion.
Qwen2.5 Omni offers a 7B-sized model for free downloads under the Apache 2.0 license agreement. Meanwhile, QVQ-Max utilizes visual reasoning to anticipate responses, showing that longer processing time leads to better results, as demonstrated by MathVision tests.
Currently, QVQ-Max is available in Qwen Chat without API access or model downloads.
TLDR: Alibaba Cloud’s Qwen team introduces Qwen2.5 Omni and QVQ-Max, advanced AI models with multimodal capabilities and visual reasoning for enhanced performance.
Leave a Comment