OpenAI has unveiled new models that transform text into voice with increased tone options and decreased text-to-speech errors. The primary new text-to-speech model is gpt-4o-mini-tts, boasting the ability for developers to customize speech patterns such as speaking like a mad scientist or a warm and nurturing teacher. Try out different styles here.
As for the new speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, they will replace the Whisper model. These new models have been trained on high-quality audio data, enabling them to capture a wide range of accents and have fewer errors than the original Whisper model when encountering unfamiliar words.
OpenAI has also reported testing results, showing that gpt-4o-transcribe has significantly lower error rates in multiple languages where Whisper struggled. The model has been greatly improved, with the error rate in Thai reduced from 12% to 5%.
Source: OpenAI and TechCrunch
TLDR: OpenAI introduces advanced text-to-speech and speech-to-text models with improved customization options, accuracy, and support for various languages.
Leave a Comment