Home ยป OpenAI Releases New Voice-to-Text Model: Smaller Yet Nearly Equal in Quality to the Original, Except Thai Language Deterioration.

OpenAI Releases New Voice-to-Text Model: Smaller Yet Nearly Equal in Quality to the Original, Except Thai Language Deterioration.

OpenAI has released the whisper-large-v3-turbo voice-to-text conversion model, which has been optimized by reducing the decoder layers from 32 to 8, resulting in a decrease in parameters from 1.55 billion to only 809 million.

After the optimization, the team retrained the original large-v3 model for two more rounds and found that the model regained its quality, closely matching the original model’s performance, except for Thai and Cantonese languages, where there was a noticeable decrease in performance. In the case of the Common Voice dataset, the error rate for Thai language increased by nearly four times.

The development approach for whisper-large-v3-turbo was adapted from the Distil-Whisper research, which involves training a smaller model using outputs from a larger model. However, OpenAI opted to train with full data instead.

Currently, whisper-large-v3-turbo is the starting model in the openai-whisper package’s latest version. Users utilizing Thai language may need to be cautious and switch to other models.

Source: OpenAI/Whisper

TLDR: OpenAI released the whisper-large-v3-turbo model with optimized parameters, retrained it for improved performance, and advised caution for Thai language users.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Telegram’s Gratis Provision of Voice-to-Text Functionality Allows Users to Transcribe Utterances Twice per Week, albeit With Limitations

Google’s New Update Unlocks Pixel 8 and 8a with Gemini Nano, Equalling Pixel 8 Pro; Introduces Wide Screen Mode.