After HuggingFace attempted to reproduce DeepSeek-R1 in its entirety, the first output has now emerged as the OlympicCoder-7B model, developed from Qwen2.5-Coder. OlympicCoder-7B leverages pre-thought data sets from CodeForces-CoTs, which provide programming challenges in C++ and Python languages to feed into DeepSeek-R1, totaling over a hundred thousand queries. By utilizing the 7B and 32B Qwen2.5-Coder models, the current focus is solely on Olympic exam question sets. The test results indicate that OlympicCoder-32B can outperform QwQ-32B and DeepSeek-R1, meanwhile maintaining its positions as o1 and o3-mini runners-up.
The training from OlympicCoder has provided the team with valuable insights, such as techniques for sample packing improving model efficiency, the ability to adjust learning rates higher, encountering challenges where models refuse to solve new problems not previously trained for, and confronting memory issues stemming from prolonged training with extensive internal thought processes.
Source: HuggingFace
TLDR: HuggingFace successfully developed the OlympicCoder-7B model based on Qwen2.5-Coder, showcasing superiority in Olympic exam questions over competitors QwQ-32B and DeepSeek-R1. Valuable lessons were learned regarding model efficiency, learning rate adjustments, problem-solving refusal, and memory constraints during training.
Leave a Comment