Home » The Latest Iteration of LLM, GPT-4, Surpasses ELIZA’s Performance from 1966, and Even Outperforms GPT-3.5, According to Turing Test Results

The Latest Iteration of LLM, GPT-4, Surpasses ELIZA’s Performance from 1966, and Even Outperforms GPT-3.5, According to Turing Test Results

A team of researchers from UC San Diego has released findings from a test that measures the performance of artificial intelligence (AI) in deceiving humans into believing it is a human, known as the Turing Test. The preliminary results indicate that GPT-4 scored the closest to human levels.

In this experiment, participants logged into an online system where they would queue to interact with AI models, including GPT-3.5, GPT-4, and ELIZA, a chatbot program designed to mimic human conversation dating back to 1966. The judges, tasked with determining whether they were conversing with an AI or a human, had 5 minutes to judge each text interaction, limited to a maximum of 300 characters. The system randomly assigned participants to converse with either an AI or a human.

A total of 652 individuals took part in the testing, resulting in 1,810 games played. However, the team removed some games where participants exclusively encountered AI with no human interaction or vice versa. This left a total of 1,405 games, which were then evaluated to measure the extent to which AI or humans were able to convincingly deceive their counterparts into thinking they were human.

The results revealed that human judges accurately identified genuine humans with a rate of 63%, surpassing all the AI models. Interestingly, the two designated AI models, GPT-4, performed impressively with a score of 41%. What is noteworthy is that ELIZA, a simple code developed decades ago, outperformed both GPT-3.5 models, scoring at 27% in the experiment.

The research team noted that despite its age, ELIZA’s success can be attributed to its programmed ability to sustain a conversation without displaying knowledge, excessive friendliness, or providing assistance beyond what is typical for humans. Interestingly, ELIZA seemed more “annoying” compared to more modern AI counterparts.

In conclusion, the study conducted by the UC San Diego research team demonstrates that GPT-4 performs closest to human levels in deceiving human judges. However, it is equally remarkable that ELIZA, a much older program, outperformed more advanced AI models in this specific experiment.

TLDR: UC San Diego researchers conducted a Turing Test experiment involving AI models, human judges, and the chatbot ELIZA. GPT-4 came closest to fooling judges into believing it was human, while ELIZA, despite its simplicity, outperformed newer AI models.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Introducing ChatGPT Team’s Corporate Edition: Unveiling OpenAI’s Premier Package for Small-Scale Enterprises, Catering to a Maximum of 150 Users.

The Head of AI at Duolingo Expresses Satisfaction with the Utilization of GPT-4; Envisions Wider Implementation in the Future.

The arrival of Copilot Application for iPhone and iPad – just days following its Android counterpart