Home ยป Enhancing ChatGPT’s Capabilities: Augmenting Auditory Perception, Visual Acuity, and Vocalized Response by OpenAI

Enhancing ChatGPT’s Capabilities: Augmenting Auditory Perception, Visual Acuity, and Vocalized Response by OpenAI

OpenAI has added features to the mobile version of ChatGPT, allowing users to have direct voice conversations with the AI. The user’s spoken words are converted into text using a Whisper model, a previously released language model developed by OpenAI. Meanwhile, the text-to-speech conversion for the AI’s responses is done by a team of professional voice actors.

Another notable feature is the image input capability announced by OpenAI during the launch of GPT-4. The multimodal mode is now available for both GPT-3.5 and GPT-4 (referred to as GPT-4V). This feature enables the AI to process various types of images, ranging from regular photographs to documents that contain both text and visuals.

These two features greatly expand the versatility of ChatGPT. It can now translate direct speech into text or provide audio descriptions of visual content, enhancing its usability. For example, it can be integrated with the Be My Eyes app to provide audio descriptions for visually impaired users.

TLDR: OpenAI introduces new features to ChatGPT mobile version, enabling voice conversations using Whisper model. It also supports image input in multimodal mode, expanding its capabilities to process various types of visuals. These additions enhance ChatGPT’s versatility and potential applications.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Introducing Dee Templeton: Microsoft Appoints Visionary Observer in OpenAI, a Microsoft Venture

Reimagining the Future: Samsung’s Vision for Galaxy S24 as the Epitome of Artificial Intelligence-Powered Brilliance

Evaluation of Risk of Advanced Computing Intelligence o1 at Highest Medium Level Ever Published by OpenAI