Home ยป OpenAI Releases New Voice-to-Text Model: Smaller Yet Nearly Equal in Quality to the Original, Except Thai Language Deterioration.

OpenAI Releases New Voice-to-Text Model: Smaller Yet Nearly Equal in Quality to the Original, Except Thai Language Deterioration.

OpenAI has released the whisper-large-v3-turbo voice-to-text conversion model, which has been optimized by reducing the decoder layers from 32 to 8, resulting in a decrease in parameters from 1.55 billion to only 809 million.

After the optimization, the team retrained the original large-v3 model for two more rounds and found that the model regained its quality, closely matching the original model’s performance, except for Thai and Cantonese languages, where there was a noticeable decrease in performance. In the case of the Common Voice dataset, the error rate for Thai language increased by nearly four times.

The development approach for whisper-large-v3-turbo was adapted from the Distil-Whisper research, which involves training a smaller model using outputs from a larger model. However, OpenAI opted to train with full data instead.

Currently, whisper-large-v3-turbo is the starting model in the openai-whisper package’s latest version. Users utilizing Thai language may need to be cautious and switch to other models.

Source: OpenAI/Whisper

TLDR: OpenAI released the whisper-large-v3-turbo model with optimized parameters, retrained it for improved performance, and advised caution for Thai language users.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Telegram’s Gratis Provision of Voice-to-Text Functionality Allows Users to Transcribe Utterances Twice per Week, albeit With Limitations

Anthropic Unveils App for Mac and Windows Operating Systems

Revolutionary Software HUGS Launched by Hugging Face Allows Multiple Server-Hosted Models for Rental Use by Others