A research team from Hong Kong Polytechnic University has published a report on the development of the LLM4Decompile artificial intelligence model, specifically designed to convert assembly files back into C language code. Previously, there have been reports that general-purpose language models like GPT-4 can decompile code.
LLM4Decompile is a specialized model available in three sizes: 1.3B, 6.7B, and 33B, trained on a 4 billion token C language code dataset. The team created the Decompile-Eval test suite to measure the model’s performance, focusing on re-compilability and re-executability, similar to HumanEval tests used for programming evaluations but with assembly programs as input.
Results from the LLM4Decompile tests show that, while GPT-4 outperforms in most cases, when it comes to running the decompiled code accurately, LLM4Decompile edges ahead with a 21% accuracy rate compared to GPT-4’s 14%.
Source: LLM4Decompile
TLDR: Hong Kong Polytechnic University researchers introduce LLM4Decompile, a specialized AI model for converting assembly files into C code. Tests show its ability to decompile and run code outperforms general models like GPT-4.
Leave a Comment