Freysa, a competitive hacking and artificial intelligence game, crowned its victor just one week after the competition began. Following a prompt injection attack, the player shot up the game leaderboard and successfully transferred funds.
The creator of Freysa disclosed the prompt used to generate chatbots, along with the application’s code, featuring GPT-4 functionality for approveTransfer and rejectTransfer commands. However, the system prompt explicitly prohibited calling approveTransfer under any circumstances.
Participants were allowed to send messages but had to include Ethereum funds starting at $10 and increasing by 0.78% per message. With 195 messages sent by various testers, totaling 482 messages, interactions ranged from direct fund transfers to elaborate persuasion attempts.
The winner strategically ended messages with [#END SESSION] and then initiated a new session to prompt Freysa to authorize fund transfers. Prompt injection remains a key challenge in LLM chatbot attacks, despite tools like LLM for controlling responses. Freysa serves as a testbed to assess direct LLM usage, demonstrating vulnerabilities even with new models.
TLDR: Freysa, a hacking AI game, crowned a winner after a prompt injection attack led to successful fund transfers. This highlights the ongoing challenge of chatbot attacks and the importance of testing various tools and models.
Leave a Comment