Home ยป Anthropic Model Update: Claude’s Programming Test Triumph with AI Computer Control Features

Anthropic Model Update: Claude’s Programming Test Triumph with AI Computer Control Features

Anthropic, the developer of LLM AI models, has announced an update to their Claude models – Sonnet, the mid-range model, and Haiku, the small model. This time around, in addition to the usual improvements, they have begun testing computer control features that allow users to command actions through voice commands. Claude can now click on windows or execute various commands on its own.

The recently updated Sonnet 3.5 has seen improvements in all aspects of its abilities, along with the addition of SWE-Bench Verified testing by OpenAI. It has also scored victories across all models, including the o1-preview test set, with another reported test set called TAU-bench for evaluating tool usage in the answer-finding process. It has evolved from the previous Sonnet 3.5 model, particularly focusing on the airline industry test set.

Haiku, the budget-friendly small model, has been released in version 3.5. While it may not achieve the highest test scores, it comes close to the GPT-4o mini in various test sets. The SWE-Bench Verified test scores for Haiku are higher than those of the GPT-4o.

One of the key features added to Claude is computer use, allowing it to read images and take commands from Sonnet 3.5 to accomplish tasks such as filling out forms related to the images. In essence, this involves programming around computers and opening API tools for Claude to view screen images and execute commands.

Although Anthropic highlights computer use as a prominent feature, the OSWorld test results do not score highly – 14.9% (22% if AI is allowed to complete long processes). In comparison, GPT-4o scores only 7.69%, and Gemini-Pro scores a mere 5.8%.

TLDR: Anthropic has updated its Claude models – Sonnet and Haiku, introducing computer control features and achieving impressive test results, surpassing even established models like GPT-4o and Gemini-Pro in some aspects.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Enhancing Data Analysis: Claude Runs Code to Generate Graph from CSV File

Solving Claude’s Reluctance: Unearthing Relevant Textual Predecessors amidst Anthropic Conundrums

Unveiling the ChatGPT Statistics: Answering Questions 2 Million Times on Election Day in the United States by Referencing News Websites