Home ยป Unveiling Qwen Omni by Alibaba Cloud: Viewing Videos Through Voice Responses; Introducing QVQ-Max for Image Reading and Pre-Answering Analysis.

Unveiling Qwen Omni by Alibaba Cloud: Viewing Videos Through Voice Responses; Introducing QVQ-Max for Image Reading and Pre-Answering Analysis.

Alibaba Cloud’s Qwen team has unveiled two new artificial intelligence models: Qwen2.5 Omni and QVQ-Max. Qwen2.5 Omni is a multimodal model that can interpret images, sound, and videos, while also capable of responding in voice or text. On the other hand, QVQ-Max is a model that thinks before answering, based on visual input, enabling it to comprehend complex documents.

Qwen2.5 Omni is a multimodal model that processes text, sound, and images, allowing it to analyze time-aligned multimodal data for better video understanding. It can generate verbal responses while its architecture supports a “listen and answer” approach, allowing it to answer before input completion.

Qwen2.5 Omni offers a 7B-sized model for free downloads under the Apache 2.0 license agreement. Meanwhile, QVQ-Max utilizes visual reasoning to anticipate responses, showing that longer processing time leads to better results, as demonstrated by MathVision tests.

Currently, QVQ-Max is available in Qwen Chat without API access or model downloads.

TLDR: Alibaba Cloud’s Qwen team introduces Qwen2.5 Omni and QVQ-Max, advanced AI models with multimodal capabilities and visual reasoning for enhanced performance.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Alibaba Cloud Launches Qwen-Max, Closed-Source AI with Proximity to Lllama3.1-405B/GPT-4o Abilities

Microsoft Initiates Risk Dissemination from OpenAI, Assembles New Team to Develop AI Models as an Alternative

Synergistic Partnership: Databricks and Anthropic Team Up for 5-Year Collaboration, Offering Organization Clients Access to Claude Models on Platform