Home ยป OpenAI Launches SWE-Lancer Testing Suite for Real-World Programming Challenges with $1 Million Prize Pool.

OpenAI Launches SWE-Lancer Testing Suite for Real-World Programming Challenges with $1 Million Prize Pool.

OpenAI has introduced the SWE-Lancer testing suite, derived from 1,488 programming tasks on the Upwork platform, with varying pay ranging from $50 to $32,000. The total pay for the suite amounts to $1 million, with AI earning scores as it solves individual tasks. From the total $1 million task pool, specific pay is assigned to the IC SWE subtest, emphasizing programming work, with a full score of $236,000. The current top-scoring model is o3-high, introduced today, earning $65,250, while o4-mini-high earns $56,375, twice as much as o1-high. Despite the significant gap from the full score, this testing suite is poised to showcase AI’s future development compared to SWE-Bench Verified, where o3 currently scores 69.1%. Notably, Claude 3.5 scores up to $58,000, surpassing o4-mini-high. Further analysis reveals that all AI models excel in backend work but struggle in UX/UI scoring. The testing suite is available on GitHub, albeit lacking support for multimodal, hence the absence of visual aids.

TLDR: OpenAI unveils SWE-Lancer test suite with $1 million worth of programming tasks, showcasing AI’s capabilities and areas for improvement.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

The Year in Review: Surging Enrollments in AI Courses on Coursera, a Whopping Every Minute

The Essence of Sam Altman’s Maiden Interview Post the Profound Turmoil at OpenAI: Emphasizing the Inception of Secure and Beneficial AI Advancements for the World.

OpenAI Reveals Projected Revenue of Over $1 Billion in the Next 12 Months