OpenAI has released a blog post explaining the recent incident of releasing a flawed version of their model, along with an analysis of the testing errors that led to the release of this model.
Typically, OpenAI’s models are trained using reinforcement learning systems with a scoring system for high-quality responses. The system scores responses based on correctness, usefulness, and safety. After training, the model undergoes several checks, including performance evaluation from various test sets, expert reviews, safety assessments, and limited scale testing.
The latest update marks the first round to rely on ๐ and ๐ ratings as rewards for the model. User feedback is expected to emphasize which answers are of good quality, but using this signal along with adjusting other models reduces the importance of other aspects focused on beneficial responses.
OpenAI acknowledges that during testing, experts felt unfamiliar with the new model version. However, the evaluation process did not assess the level of proficiency adequately. A small-scale A/B testing yielded favorable results, prompting the team to release this model. When informed on a wider scale that the new model was flawed, it had to be taken down.
Future corrections will involve increased behavior monitoring, allowing a group of users to sign up for the new model version to test on a larger scale, checking for irregularities before releasing the new version. OpenAI admits that quantitative measurements do not provide complete information, and many behaviors cannot be accurately measured.
TLDR: OpenAI released a flawed model due to testing errors, leading to behavior monitoring and larger-scale testing for future versions.
Leave a Comment