OpenAI has added features to the mobile version of ChatGPT, allowing users to have direct voice conversations with the AI. The user’s spoken words are converted into text using a Whisper model, a previously released language model developed by OpenAI. Meanwhile, the text-to-speech conversion for the AI’s responses is done by a team of professional voice actors.
Another notable feature is the image input capability announced by OpenAI during the launch of GPT-4. The multimodal mode is now available for both GPT-3.5 and GPT-4 (referred to as GPT-4V). This feature enables the AI to process various types of images, ranging from regular photographs to documents that contain both text and visuals.
These two features greatly expand the versatility of ChatGPT. It can now translate direct speech into text or provide audio descriptions of visual content, enhancing its usability. For example, it can be integrated with the Be My Eyes app to provide audio descriptions for visually impaired users.
TLDR: OpenAI introduces new features to ChatGPT mobile version, enabling voice conversations using Whisper model. It also supports image input in multimodal mode, expanding its capabilities to process various types of visuals. These additions enhance ChatGPT’s versatility and potential applications.