NSA Ghidra software reverse engineering framework

motohagiography · on March 29, 2023

There's a discouragment the comes in the RE community that to be useful at all you need to be able to write your own exotic packer decoders, but I use Ghidra about once a month for really basic security incident response to pull apart "driver" installer packages to see where they are phoning home to, evaluating enterprise vendor-ware packages looking for hard coded credentials and snoopy telemetry, sometimes I can pull down the second stage of a phishing attempt against one of our users and RE it just to see what the level of sophistication the attackers having a go at us are at, and I've used the cantor dust plugin to quickly find sections of compressed and encypted data in firmware images.

There is no chance I will ever publish original RE research, but it's a handy go-to tool, along with cyberchef, binwalk, and some other breadth-first static analysis tools for hunting specific IoCs. I could probably teach a solid generalist who cared to get to the level of being able to dissassemble something and say, "yeah, this is dodgy" or not in an afternoon.

As an exercise, next time you get a cheap peripheral like a headset or a other usb device, pop the driver installer package into ghidra and click through the call graph just to see what else it does. You may be surprised.

boppo1 · on March 29, 2023

> I could probably teach a solid generalist who cared to get to the level of being able to dissassemble something and say, "yeah, this is dodgy" or not in an afternoon.

Please write and post this guide.

myself248 · on March 29, 2023

Seriously. There's lots of "you're already a rocket scientist so let's talk details" content out there, but very little with this incredibly-useful-sounding aim.

The trick is calibrating what a "solid generalist" means. I think I'd describe myself that way, but perhaps not among the HN crowd. Would be very interested in being a soundboard for such content, if that's helpful.

motohagiography · on March 29, 2023

thanks! it would to take longer to write, but a basic entry point starts with what you want to know about something: who, what, when, where, why, and how. Trouble is, we tend to start with the a complete picture of How without a sense of the rest to guide us.

What you want to know about a strange binary (barring obfuscation, sandbox escapes, and other nasties) is: who does it talk to (ip addresses, hostnames, sockets, etc), what does it open (files, registry entries, api's, services), when does it do these things (eg. runtime conditions, magic packets, port knocks, triggers, checking for other software), where does it write or read data (directories, filehandles, remote sites, etc), why does it do this given it's stated purpose (why does it have an encrypted section, and where is its key, is it using weird encoding to bypass filters, etc.) and then finally the How it does these all things is the effect of answering those other questions.

I think the hardest part of analysis is having an organized way of knowing what you are looking for because we don't know the right questions to ask and we tend to work at the edge of limited knowledge. Should this rando binary be talking some app hosting site, and why? Why would a developer encode endpoint names in a lookup table that only constructs and returns them at runtime? Why would someone use any of these libraries or data formats on purpose? The harder it is to answer these questions, the more suspicious I get.

If you start with the 5-W's, the How falls out of that a lot faster. If you can answer these questions about a binary, you're easily 50% there in determining whether it behaves as expected. Having an organized goal can take you from zero to basically useful if you answer those questions about it. The rest is just screenshots of menu items in ghidra and maybe cyberchef for purely static extraction.

I feel like I should pile on caveats here about how most malware isn't obfuscated or using novel techniques, a lot of it is just spyware capabilities you clicked through to accept, or a repackaged legit binary with some downloaded RAT attached and some nested compressed libraries. I'm sure someone who is more serious about this will say, "that's misleadingly simple!" but once you have a why and a what, the how is a work problem.

Dynamic debugging and stepping through is the next stage. It's also basic, but when you are goal oriented instead of being able to reproduce all usable code paths, it's more achievable. If you get the IP addresses out of random binary and what protocols it's talking, and maybe what files it accesses, it means you've set up your analysis environment and done the initial checks, and that's valuable grunt work you can pass on to someone with deeper skills.

If you can go from zero to this, that's an afternoon well spent, imo. It's not trivial, as it assumes a lot of knowledge about system architecture and network protocols, but the questions above necessarily have answers, so I can guarantee you can find them with some directed effort. I don't mean to trivialize more advanced analysis, this isn't the same thing, but as an entry point, this is how I would recommend approaching it.

myself248 · on March 29, 2023

That's an incredibly useful model for how to approach the problem! And it sounds like exactly the questions I find myself asking about random suspected-malware, which is often precisely your original example -- a burned CD included with some aliexpress hardware.

I'm familiar with 'strings' and I've been playing with 'binwalk' to take apart files, but I'm out of my depth when it comes to loading something up in a debugger or whatever (is ghidra a debugger or what's the difference?) and looking at code. I don't speak C, and everything seems to look like C when it's shown in the examples of these things. How do I know if I'm looking at a sensible decompilation with actual runnable code or just gibberish because I'm trying to interpret a jpeg as an executable?

I don't know if that makes me teachable or beyond help, but I'd be an eager student.

doktrin · on March 29, 2023

> I'm out of my depth when it comes to loading something up in a debugger or whatever (is ghidra a debugger or what's the difference?)

When you hear "debugger", think "breakpoints". It's any tool that lets you do things like set breakpoints and step through code execution.

Most debuggers will let you view machine code or bytecode respectively, but they won't decompile binaries or bytecode into the original higher level language.

Ghidra does include a basic debugger, but it can also do lots of other stuff (including decompilation).

> I don't know if that makes me teachable or beyond help, but I'd be an eager student.

It would probably help to get some baseline familiarity with systems programming. Check out the "15-213" CS course. The lectures are on YT, the reference book is probably online, and the labs are here :

https://www.cs.cmu.edu/~213/labs.html

pmoriarty · on March 29, 2023

"I don't speak C, and everything seems to look like C when it's shown in the examples of these things."

If you know how to program you could probably already make sense of a lot of C, and for the rest you could try asking an AI to explain it to you.

andai · on March 29, 2023

And if you learn a bit of assembly first, C will seem like a high level language again!

extrememacaroni · on March 29, 2023

When it comes to stuff that results in calls to dynamically linked libs e.g. OpenFile or whatever, you can also use Frida to intercept the calls and print out info about them/manipulate the inputs and/or return values. The advantage of Frida is that it uses JS to do this.

You need to run the executable to do this tho so maybe use a VM.

I used frida a few times to do random stuff like making foobar2000 always play the same mp3 regardless of what is in the playlist, and made a game's speed adjustable by intercepting calls to system gettime and changing the value.

Use Ghidra to check what the executable imports and intercept found functions in Frida.

DyslexicAtheist · on March 29, 2023

> I'm sure someone who is more serious about this will say, "that's misleadingly simple!" but once you have a why and a what, the how is a work problem.

loved your post. I'm by far a lot less experienced for sure. There is one thing in this sentenced that stood out because my order of what to address first is always what and how.

Only at the end the why (e.g motive) might or might not become visible. It has saved me from jumping to premature conclusion (or attribution) in the past ...

Most "who-dunnit" genre of films are based on making you believe the why is the ultimate goal. For me though the why is a by-product of addressing the what/how and I find things remain smoother and with less rabbit holes to get lost in.

andai · on March 29, 2023

>most malware isn't obfuscated or using novel techniques, a lot of it is just spyware capabilities you clicked through to accept

How does an antivirus tell the difference between e.g. TeamViewer and a repackaged app with a RAT?

hegzploit · on March 29, 2023

here's one way I love to think about it, A RAT will go all the way to try and persist, hide from AV, load other components from some remote endpoint. It will trigger so much events that can be detected by an AV. on the other hand, TeamViewer will not try to hide what Its doing, there's also a lot more stuff at play here since this is just heuristic analysis, AVs tend to be more complex and incorporate more methods of analysis like signature-based detection and integrity checking, etc...

j-bos · on March 29, 2023

I humbly second this ask.

cancerhacker · on March 29, 2023

You can get a long way just by running /usr/bin/strings against an executable, and maybe a platform specific version of otool -L. You should have a basic idea of how your OS does linking, shared libraries, etc.

youngtaff · on March 29, 2023

Yes please do

Tried Ghidra for the first time on the weekend to look at some 8051 firmware

Got stuck with disassembly as it seems to be misinterpreting some data sections as code - can see English strings in a hex editor but Ghidra is trying to convert them to asm

HelloNurse · on March 29, 2023

You are supposed to annotate what every part of the file is and how you want to display it. It's usually easy to distinguish reasonable assembler code from nonsense instructions interspersed with undecodable islands.

Disassembling all sections just in case they contain code is a common conservative policy for disassemblers: even without malicious payload hiding tricks even definitely never executed sections could contain embedded executable code.

youngtaff · on March 31, 2023

Thanks, I'll try that approach

It's been a while since I've looked at asm in anger so it's taking me a while to get back into it (plus this is a side project ATM)

neoncontrails · on March 29, 2023

Is cantor dust available as a plugin now? I remember watching the creator's tech talk as a young dev and being incredibly inspired by it. But I've looked it up a few times over the years, didn't find any evidence that the tool described was ever released.

motohagiography · on March 29, 2023

https://github.com/Battelle/cantordust

0d0a · on March 29, 2023

> There's a discouragment the comes in the RE community that to be useful at all you need to be able to write your own exotic packer decoders

Unless you are talking about obfuscated / virtualized payloads, isn't it common to just "cheat" by running it in an emulator / debugger, then taking the unpacked code section from memory and work from there? It was the approach I took in a CTF task: https://nevesnunes.github.io/blog/2021/10/03/CTF-Writeup-TSG...

motohagiography · on March 29, 2023

non-ghidra example, but just the other week I was pulling apart a commercial phishing kit that had implemented its own version of AES in javascript, and then created a kind of conceptual virtual file system based on nested layers of b64 and a "custom" rot-20k encoding that turned everything into unicode, where one blob was the image with offsets, and then different parts of the malware would be pulled out and decoded and decrypted at runtime - rendering the static analysis that AV and WAF tools do useless.

I used a REPL to manually do the steps you describe dynamically, but doing it statically means writing a decoder. You really need a proper sandbox to do dynamic analysis becase you don't know what's going to actually detonate, whereas static analysis gives you a whif of how off it seems, and that's sufficient for most security and privacy purposes. It was also common in Android apps several years ago now, not sure what the current state of the art is though. Android isn't my problem anymore.

Officially, I suck at this and I defer to more skilled people because I am a much better writer than hacker, but when they aren't around, you go to war with the army you have. :)

amatecha · on March 29, 2023

What do you mean by "installer driver package", like literally the setup.exe that the vendor provides? Or like, extract the resources out of that and open _those_ in Ghidra?

philsnow · on March 29, 2023

I've seen some sketchy crap while pulling apart mac .pkg files to see their preinstall/postinstall scripts. In particular one video conferencing company's installer did some "growth hacky" things a few years ago (I checked a recent one just now and it seems benign).

amatecha · on March 29, 2023

Ah yeah I remember those particular installer shenanigans for sure. Indeed, installers are often granted elevated permissions which is a perfect opportunity to drop in "extra" functionality :-O

zeeshanmh215 · on March 29, 2023

What you do is very interesting and might be helpful for the budding RE's and also privacy focussed general public. Can you point me to a direction where i can learn that stuff?

dataflow · on March 29, 2023

How dooyou tell something is dodgy from the call graph? Don't you have to decipher what FUN_918243 represents, or whatever?

saagarjha · on March 29, 2023

Generally you can click around some code and see what functions it calls, what strings it references, etc. to get a basic understanding of what it does.

amrb · on March 29, 2023

You can see imported library, strings and possible network calls, working backwards you going to see red flags if it's a basic app.

intelVISA · on March 29, 2023

Easy: all nonfree software you have to decompile to view the 'source' is dodgy by design.

Tools like Ghidra et al. merely lay bare the truth you already know.

biggieshellz · on March 29, 2023

What if someone gives you a binary that they claim is built from a particular source code? If you don't decompile it, how do you know if that's true or not? Or what if you can't trust your compiler (a la https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html)?

JCWasmx86 · on March 29, 2023

Reproducible builds. Sure not every project can be built in a reproducible manner, but it at least reduces the chances of getting shady binaries

pxc · on March 29, 2023

Check out `guix challenge` for a concrete example of how tidily this can be done with a system that supports reproducible builds well!

https://guix.gnu.org/manual/en/html_node/Invoking-guix-chall...

intelVISA · on March 29, 2023

I could never trust a bin I didn't build myself (with my own C compiler ofc).

arjvik · on March 29, 2023

Did you build that C compiler yourself? Using what compiler? Unless you bootstrapped it from a handwritten assembler, you'll need to consider the attack outlined in Reflections on Trusting Trust

intelVISA · on March 29, 2023

I did but I foolishly relied on GCC before it was self-hosted now I guess I should scrap the whole thing and build by hand.

pxc · on March 29, 2023

There's actually someone out there who has done some impressive work on this, believe it or not!

https://savannah.nongnu.org/projects/stage0/

saagarjha · on March 29, 2023

[flagged]

dang · on March 29, 2023

Please don't do this here.

saagarjha · on March 29, 2023

What would you suggest is an appropriate response to that comment? I could of course ignore or downvote it, but I'm not sure this actually conveys my sentiment towards it.

dang · on March 29, 2023

You can't just express unprocessed annoyance. You have to let the annoyance metabolize inside yourself until one of two things happens: either (1) you have something genuinely interesting to contribute; or (2) the need to respond goes away.

saagarjha · on March 31, 2023

On the contrary, I have spent quite a while considering how to respond to comments like these and this was the best I could come up with. I'm open to suggestions on what I might do instead but I will point out that the current options you've put forward either 1. make it very asymmetric to respond to stupid comments or 2. allow them to proliferate, which drives away and buries interesting conversation.

dang · on March 31, 2023

Responding to "stupid" comments fuels them, so it's better to post nothing. It's certainly much better to post nothing than to post something like https://news.ycombinator.com/item?id=35351862.

Re "allowing them to proliferate", the solution for that is flagging. (In case anyone doesn't know: to flag a comment, click on its timestamp to go to its page, then click the 'flag' link at the top. There's a small karma threshold before flag links appear.) And downvoting, of course (the karma threshold for that is higher). If the comment was bad enough to respond the way you did, why didn't you downvote and/or flag it?

saagarjha · on April 3, 2023

I don't flag comments very often unless they egregiously break the rules, but I've downvoted things like this before. Usually what happens is the person gets upset that they've been penalized by the system if the downvotes stick, or someone comes along and actually thinks the comment is good, and the effects of downranking it are reversed. I don't want to propose that my opinions on comments should overrule everyone else's, but I know you agree that just because people upvote vapid or clichéd content doesn't make it appropriate for Hacker News. So, I haven't really found your solution to work in practice.

What I could absolutely do is sit down and write a long reply about why I think the comment missed the mark, and how it could improve. I have done this in the past too. The problem here is that doing so is a lot of work. My thought process is that most of the people who are posting like this know that they're just posting low quality stuff, and if there's an easy reminder to stop doing that, they will. If they happen to reply with "no, I'm serious, here's why…" then there's no harm done; otherwise it signals (to other people too) that we're looking for something better than that.

dang · on April 3, 2023

I understand that writing a substantive comment can be a lot of work, but how can we be arguing about https://news.ycombinator.com/item?id=35351862? It was obviously unsubstantive and provocative, and would even probably land as a personal attack. It was even a case of the "vapid and cliché" that you're wanting to counteract.

Downvoting a bad comment is fine. Not posting is fine. If you want to neutrally let someone know that their comment isn't substantive enough for a good HN post, that's ok as long as you're careful how you do it. Dropping an insult on them is never helpful.

saagarjha · on April 6, 2023

I think the next steps here are that I refrain from doing this, continue downvoting, and reach out when I feel it isn't working and have more concrete feedback to provide.

graderjs · on March 29, 2023

Can you make a youtube video tutorial series on this? Would be great!

flangola7 · on March 29, 2023

How do you feel about models like GPT-4 using tools like that to RE?

amrb · on March 29, 2023

Its like a summary, speeds up the process by having a possible context.

biggieshellz · on March 29, 2023

The breadth of that tool is just incredible. I'm about to submit my first PR to them to fix a couple of bugs in their PEF parser (classic Mac OS PowerPC executables), but it's absolutely bonkers that they have that support to begin with, and that it all works as well as it does. I'm very pleased to see my tax dollars going to something like that.

amrb · on March 29, 2023

People having been extending the architect for micro controllers and game console roms also, YouTube has neat videos on their projects.

elif · on March 29, 2023

Just imagine how useful the decompiler they they use would be if they are willing to release this one.

DethNinja · on March 29, 2023

Ghidra is genuinely an awesome software, you don’t need to be a reverse engineering expert to use it.

And with LLMs like GPT it will be able to do insane stuff like automatically analysing very complex malware.

On the other hand I’m sure malware will evolve too, with LLMs you can actually directly edit the binary and add hooks to them. Cost of building firmware malware for NICs and UEFI will lower to zero dollars.

Anyway I’m getting out of topic but this was something I really wanted to mention somewhere, it is likely there will be a massive amount of complex malware coming via LLMs that will potentially impact the entire economy.

l33t233372 · on March 29, 2023

> with LLMs you can actually directly edit the binary and add hooks to them

I’m so confused what LLMs have to do with adding hooks.

Edit: I’m confused because this can be done without LLMs, and I don’t see why they’re particularly helpful here, unless the use case is instructing the LLM to “hook malloc to use my_malicious_malloc”

DethNinja · on March 29, 2023

It is not that easy to reverse engineer closed source firmware and edit the right places on the binary to prevent runtime investigation/detection and reflashing.

If you don’t care about almost undetectable persistence, then yeah you won’t need to bother with LLMs to find the perfect hook point.

saagarjha · on March 29, 2023

How does the LLM help here?

DethNinja · on March 29, 2023

I don’t have particular experience with firmware hacking but how would you achieve persistence with just a simple malloc/bad malloc replacement? What if user decides to reflash the ROM during runtime? I imagine persistence would require emulating the entire firmware update process.

In order to emulate the firmware update process, you would need to reverse engineer a large portion of the binary, right? This is where LLMs would be helpful.

l33t233372 · on March 29, 2023

> how would you achieve persistence with just a simple malloc/bad malloc replacement

How would you achieve it with the LLM? I’m totally confused what persistent malware had to do with LLMs, unless you’re just saying “LLMs are smart and are a way to automatically do hard things”

DethNinja · on March 29, 2023

That is exactly what I’m saying, a specialised LLM can reverse engineer and analyse the entire firmware / firmware update process faster than a human and thus automatically implement malware for all kinds of devices without access to source code.

l33t233372 · on March 29, 2023

Not to be a dick, but I don’t believe you. Show me, if you can.

Edit: if you mean that theoretically this may one day be possible for a LLM to automatically hook functions and introduce persistent malware, then, not being a future teller myself, I would likely agree with you. But that’s not interesting because you can say that about essentially anything.

hegzploit · on March 29, 2023

I'm interested in this stuff, care to share some good starting point?

landr0id · on March 29, 2023

Sort of a tangent but I used Ghidra last week to help a coworker figure out where some function was called by looking at the callee tree to it. He was investigating a crash reported by a user that seemed impossible to hit as the function was never called. After about 10m we figured out that the code had been recently refactored and the build I had locally was enough out of date to still have the buggy function call.

amrb · on March 29, 2023

I wrote code for the headless analysis to dump all functions, till I managed to oom my laptop decompiling a tricky sse2 function, it was fun experience so gonna open a PR see if that can be solved for others.

splonk · on March 29, 2023

Previous thread about using GPT3 with Ghidra:

https://news.ycombinator.com/item?id=34250872

yalogin · on March 29, 2023

What's an LLM and GPT?

l33t233372 · on March 29, 2023

An LLM is a large language model.

A GPT is a generative pre-trained transformer. This is a type of text generative AI model.

boppo1 · on March 29, 2023

What will the defense be?

password4321 · on March 29, 2023

LLMs on defense too.

Also, the filters on the commercial services attempting to prevent misuse.

Anything useful that gets past the filters and is used to cause damage leaves behind the prompts and user account info to be subpoenaed, though it could take a while for law enforcement to come up to speed.

amrb · on March 29, 2023

Api has no filters

atribecalledqst · on March 29, 2023

I've been working on a hobbyist project to analyze a ROM for an architecture that wasn't covered by Ghidra, and let me just say. I had a hellish time trying to work with Sleigh, the language you use to define new architectures for Ghidra to analyze. There just isn't a ton of great info out there about it, outside of the Sleigh documentation itself. I was able to find a few guides online but none were quite at the level of detail I was looking for.

I ended up getting lucky and finding somebody else's project for the same CPU, that I was able to build on to make something that worked. And by doing that I was eventually able to figure out why I couldn't even get off the ground.

0d0a · on March 29, 2023

I'm also writing a processor module, and reading this is a bit encouraging to eventually write about it once it's finished.

Getting off the ground wasn't the hardest part so far. You can just pick the skeleton module that already comes with Ghidra, then lookup some existing simpler modules like the one for z80 to figure out how instructions are put together. You also have the script `DebugSleighInstructionParse` to check how bits are being decoded, very useful when you screw up some instruction definitions.

Unfortunately, you bump into a lot of jargon heavy error messages. The first time you hear about "Interior ellipsis in pattern", you sure have no idea what's that about. Now repeat that experience for several messages.

Then the hardest challenge is how to even test the module outside of some quick disassemblies. There's `pcodetest` but the setup is cumbersome and it seems more about validating instruction decoding rather than semantics. I might just write my own validation using pcode emulation and compare the register state against another emulator's instruction trace...

mumbel · on March 29, 2023

Pcodetest is more about validating the implementation of the instruction, sure it has to decode, but the benefit is most a base level set of logic that can be emulated. And definitely not a fan of the setup to get it going (also only helpful if you have a semi recent C compiler)

0d0a · on March 30, 2023

Oh nice, it wasn't clear from the test suite if that was the case, I'll give it a closer look.

Judging from the python scripts, it seems to expect a whole binutils toolchain (so not just compiler but also objdump, readelf...) and that would be a blocker for me.

mumbel · on March 31, 2023

Compiler (gcc) and maybe assembler (as) are used. I think the other binutils executables are unused but still built-in to their logic. Due to it's age and being removed from gcc, I was unable to cleanly setup pcodetest for 80960 (had to hack it all together and scripted their java portion to work with hack), but was super useful for improving tricore (pcodetest wasn't released when I submitted original PR) and writing risc-v.

mdaniel · on March 29, 2023

> And by doing that I was eventually able to figure out why I couldn't even get off the ground.

... which I then wrote up in a gist or pastebin or Toot or Tweet so the next pour soul wouldn't have to suffer like I did

is the rest of that, right?

Dwedit · on March 29, 2023

For me, the one spot where Ghidra is lacking is support for vtables (or COM objects). You can't simply feed it a C++ header file that defines the COM object.

amrb · on March 29, 2023

Silly question but was a plugin like this not enough to fix the vtable? Else do we just need a feature added or is it more involved? https://github.com/astrelsky/Ghidra-Cpp-Class-Analyzer

mdaniel · on March 29, 2023

Is there an issue requesting that behavior? My guess is it'll be some onoz trying to ship a C++ parser but "you don't ask, you don't get" and asking on HN is not what I meant :-D

Actually, having written that out: is there vtable support and just not C++ header parsing support, or both facets are missing?

amrb · on March 29, 2023

Reading the issues [0] there is a 'prototype' script "RecoverClassesFromRTTIScript.java" to rebuild vtable's, testing on a project it does indeed resolve further.

https://github.com/NationalSecurityAgency/ghidra/issues/516

tomas789 · on March 29, 2023

Ghidra is reasonably simple to pick up at the entry level. I use it just for fun to make a keygen for commercial software from time to time. Just to flex the muscle.

My take aways are: 1/ You can do RE without knowledge of assembler (which I know nothing about) 2/ C decompiler is useful and you will need to learn some patterns of how things get disassembled 3/ There are many good videos on youtube on how to get started 4/ There is a debugger to see what the program actually does but I never managed to get it running. That would be awesome feature to use.

steponlego · on March 29, 2023

One cool thing is the infinity dragon logo, it’s a recurring motif with other NSA projects.

amrb · on March 29, 2023

I'm loving this for syncing between RE tools, tho would be great if we had help on the supported ghidra features!

https://github.com/binsync/binsync

mdaniel · on March 29, 2023

https://github.com/binsync/binsync/labels/ghidra being empty likely doesn't help. I know there's likely no such thing as "claiming" an issue, but I'd much rather have "optimistic locking" than 8 people concurrently working on the same "global vars" support

mahaloz · on March 29, 2023

Well, uh, no one is working on it yet hehe. When I'm aware someone is working on something, I try to have them PR as soon as possible so a running Draft PR show's everyone it's locked. If you don't seen a draft PR, it means its up for grabs :). Honored to have anyone with extra time help out <3.

thund · on March 28, 2023

The best part of readme: security warnings.

kuroguro · on March 29, 2023

It does seem ironic on first glance but it's pretty much unavoidable getting some vulns in a large project. In fact if you're reversing malware that might be actively trying to sabotage that, it's a good thing they've put warnings front and center.

ezconnect · on March 29, 2023

Is Ghidra a trojan horse?

ezconnect · on March 30, 2023

Is it not a valid question? If I want to find out hackers the best way to monitor them is to release a tool they would use.

daveofdaves · on March 29, 2023

Somehow this thing downloaded itself so there's that

lionkor · on March 29, 2023

What do you mean?

daveofdaves · on March 29, 2023

Somehow this thing downloaded itself and that's enough

elif · on March 29, 2023

Has anyone RE'd ghidra using another decompiler to determine whether it hides NSA backdoors etc?

lionkor · on March 29, 2023

> RE'd ghidra

What, like, read the source code [1] or reverse engineered a binary? Would be easy(ish) to tell if the code in the binary was different from the source, probably.

[1]: https://github.com/NationalSecurityAgency/ghidra

elif · on March 29, 2023

Being a large open source project is an even lower standard of transparency than a formal NIST review of a very small codebase, from which the NSA was able to hide at least one backdoor. It wasn't until use in the wild for decades revealed the ECC magic number that this vulnerability was uncovered [0].

Similarly RE has a way of investigating the actual functioning of code in a way more thorough than a human tasked with hunting for an intentionally obfuscated defect (if even any human has undergone that process)

[0] https://jiggerwit.wordpress.com/2013/09/25/the-nsa-back-door...

nibbleshifter · on March 29, 2023

Its open source. No RE necessary.

elif · on March 29, 2023

Plenty of encryption algorithms created by the NSA were also public and contained backdoors.

amrb · on March 29, 2023

Fyi there was free OpenAI credit given out, so decided to try out ghidra with the G-3P0 plugin, imo has been fun looking around binary's with basic C experience.

abudabi123 · on March 29, 2023

Maybe a better UIX in the readme says, 1) buy an NVIDIA Jetson ODIN 64GB Mini 2) press the buy and play button in the App Store 3) you are running in a AAA Studio IDE

Grothendank · on March 29, 2023

Is ghidra safe to use if you consider the NSA an adversary?

Every person I've asked this question has had their noses so far up the NSA's pooper that they could not imagine considering the NSA an adversary.

But suppose you were running a malware honeypot operation for the CCP. Would you still use Ghidra? Why or why not?

And please don't pass the buck and say, "I probably wouldn't be allowed to use ghidra" or "I'd probably use whatever my CCP handler told me to use" or "I wouldn't be working for the CCP in the first place." That does not inform me about the security risks of using ghidra with the NSA as an adversary.

mumbel · on March 29, 2023

It's pretty dumb this continues to come up years later. You're the NSA delivering source code to the cyber security community. The exact community that: doesn't immediately trust NSA, knows how to find bugs, would love to find any sort of bug in their code (regardless if malicious), people you want to apply for your jobs, people you partner with (academia/other govt orgs/other country cyber security groups).

So your thinking is: yes, this is the crowd we'll attempt to insert backdoor java code.

Okay fine you still don't trust them? Run in a VM without network connection. What security risks/threat are you even talking about?

And yes people have heavily audited the source. You either trust the community catches thing or not. I'm the end of your still tin foil about it, don't use, nobody cares.

Grothendank · on April 4, 2023

Here's my basic position:

1. The risk and threats are published

2. The audits I've seen don't evaluate the threats

3. Link me to the audits if you want to convince me

I. The risks - airgapping is not enough

1. If the software has zeroday beacons in it, it can communicate with zeroday beacon repeaters embedded in VM, OS, or hardware (see: cache side channels: https://dl.acm.org/doi/abs/10.1145/3133956.3136064 )

2. The beacons wouldn't have to look like exploit code, they could just be timing bugs sprinkled into the codebase at random. There are plenty of random little warnings and defects in the code that nobody is ever going to check or fix, see this audit: https://github.com/NationalSecurityAgency/ghidra/issues/382

3. Airgaps may be broken by ultrasound side channels; communication to compromised devices like smartphones is possible (see: speaker-to-gyroscope communication https://ieeexplore.ieee.org/abstract/document/9647842/ ; speaker-to-speaker communication https://arxiv.org/pdf/1803.03422.pdf)

4. Low bitrate data leaks, like "ghidra is running in this org, decompiling files named....." may be accumulated by the NSA

This is just zero-day warehousing and passive signals collection with embedded zerodays. It would be hard for security researchers to detect this. I'd happily change my mind if you showed me an audit that looks for beacons and other side channels.

II. The audits

Here is the one audit I could find

https://github.com/NationalSecurityAgency/ghidra/issues/382

This audit tells us that the code is janky, but doesn't tell us if it's secure. It's just a dump of thousands upon thousands of static analysis errors.

There's no threat anaylsis in this audit. But it does suggest the code has so many defects that a serious audit would be very expensive.

III. Change my mind with evidence

Please link me to the heavy audits of the code. If you can.

tldr;; I think the code is less heavily audited than you can support

lovetocode · on March 29, 2023

Code is right there -- just look at it.

Grothendank · on March 29, 2023

I'm looking at it. It's very beautiful code. 2,049,616 lines of it.

Okay. I have looked at the code. Now what? Has that made me more secure?

lionkor · on March 29, 2023

Yes https://github.com/NationalSecurityAgency/ghidra

Grothendank · on March 29, 2023

I'm not trying to be flippant here, but how many of ghidra's 2 million lines of code have you audited?