Maybe a bit off topic, but has anyone had success using LLMs to reverse engineer...

stavros · on Nov 14, 2023

Why use LLMs when you have a disassembler?

thewanderer1983 · on Nov 14, 2023

Spoken like someone who has never done reverse engineering of binaries.

Here are some examples of people using Ghidra. https://github.com/evyatar9/GptHidra https://github.com/likvidera/GhidraChatGPT https://github.com/tenable/ghidra_tools/tree/main/g3po

I suspect there are better ones being worked on though.

blopker · on Nov 15, 2023

Thanks for this. I've been using Binary Ninja, but may have to try out Ghidra. I was surprised how little I found online about it, none of the major vendors seem to have first class support yet.

kikoreis · on Nov 14, 2023

I think "[automated] disassembly" has a different implication than reverse-engineering; the latter usually involves more depth in the analysis of the binary, usually including more semantic-level considerations (i.e. this block is meant to do this, or this function is used from these different callsites). The best examples of this type of analysis seem to exist in the security community when going into the detail of zero-days, exploits, etc. I think LLMs either already can or will soon enter that space.

crotchfire · on Nov 15, 2023

I am extremely interested in this. The utter silence in the literature around this is a bit odd, frankly.

Not LLMs specifically, but transformers more generally, ought to be extremely good at this task.

Another one I wonder about a lot is execution traces. Something that isn't mentioned often enough is that transformer models can do previous-token completion just as easily as next-token completion. So you can train the model on paired program/exectrace (training data is trivial to generate here -- just execute it on a CPU!) and then ask the model to work backwards from a desired machine-state you want to reach.