Back in the day, we used to embed strings into the translation units that would report the original name and version of the file. One could use the 'strings' command to get detailed information about what files were used in a binary and which version! DVCS (git) broke that, so most people don't remember.
Knowing know which version of a file made it into a binary still doesn't really help you, though. The compiler used (if any), the version of the compiler and linker, and even the settings / flags used affect the output and -- in some cases -- could convert an otherwise secure program into something exploitable.
A Software BoM sounds like a "first step" towards documenting a supply chain, but I'm not sure it's in the right direction.
This feels like this might actually be a use-case for a blockchain or a Merkle Tree.
Consider: A file exists in a git repository under a hash, which theoretically (excluding hash collisions) uniquely identifies a file. Embed the file hashes in the executable along with a repository URL and you essentially know which files were used to build a file. Sign the executable to ensure it's not tampered with, then upload the hash of the executable to a block chain.
If your executable is a compiler, then when someone else builds an executable then they can embed the hash of the compiler into the executable to link the binary back to the specific compiler build that made the binary. The compiler could even include into the binary the flags used to modify the compiler behavior.
>This feels like this might actually be a use-case for a blockchain or a Merkle Tree.
A few years ago, a similar idea for firmware binary security[0] had been explored by Google as a possible application of their Trillian[1] distributed ledger, which is based on Merkle Trees.
I don't know if they've advanced adoption of Trillian for firmware, however, the website lists Go packaging[2], Certificate Transparency[3], and SigStore[4] as current applications.
> Supply chain Levels for Software Artifacts, or SLSA (salsa), is a security framework, a check-list of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure in your projects, businesses or enterprises.
> SLSA defines an incrementally-adoptable set of levels which are defined in terms of increasing compliance and assurance. SLSA levels are like a common language to talk about how secure software, supply chains and their component parts really are.
Trillian was already the name for a distributed system - an instant-messaging app. I used to use it around a decade ago. Seems like they still exist, but have pivoted into enshittification (it's an Electron app now, and you have to buy a subscription for chat history and read receipts).
> One could use the 'strings' command to get detailed information about what files were used in a binary and which version! DVCS (git) broke that, so most people don't remember. [...]
> Consider: A file exists in a git repository under a hash, which theoretically (excluding hash collisions) uniquely identifies a file.
Git is of course incapable of including a file version number, as that’s not really well-defined in a distributed setting. But if you’re OK with a blob hash, put $Id$ in your source files, mark them with “ident” in .gitattributes, and you’ll see the hashes included and autoupdated on your next checkout[1]:
> When the attribute `ident` is set for a path, Git replaces $Id$ in the blob object with $Id:, followed by the 40-character hexadecimal blob object name, followed by a dollar sign $ upon checkout. Any byte sequence that begins with $Id: and ends with $ in the worktree file is replaced with $Id$ upon check-in.
How do you identify if log4j was one of your dependencies then? You'd just have a hash of the pom.xml or whatever and you'd still need a tool to check it.
edit: or do you mean that your hypothetical tool would generate the repository URLs + file hashes for its dependencies as well and bundle those with its own?
log4j is usually included as a binary component in a .jar file, so it should be obvious that it is a dependency. If you're using log4j with a remote logger, that's a configuration that probably wouldn't be captured by a hash. However, that fact that a specific version of log4j client library is included in your distribution should be easy to identify.
> Back in the day, we used to embed strings into the translation units that would report the original name and version of the file.
We still do this today. We rely on some tooling-added information and add our own custom information on top (git hash, hash of direct dependencies, compiler version and flags, etc).
> This feels like this might actually be a use-case for a blockchain or a Merkle Tree.
Merkle trees sure but you do not need a blockchain to store/manage hashes of sources and build products. Just have trustworthy parties attest that source commits yield particular binaries by their hashes and publish those signatures somewhere. Even Bitcoin Core does something like this using PGP.
The only reason for suggesting a block chain is that the information should be publicly available without having to visit a specific vendor's web site for confirmation.
But maybe it's not necessary.
DNS Root Zones, for example, are publicly accessible and well managed (maybe not the registrars, but the registry) so perhaps a well-funded third party could establish a trusted, global, registry for supply chain Merkle Trees.
> The compiler could even include into the binary the flags used to modify the compiler behavior.
There may be multiple sets of compiler flags and even multiple compilers (can there even be multiple linkers?)
In addition, having to pick between multiple functions called foo, a linker may pick either of them, and may not always pick the same one (parallel linkers often aren’t deterministic), even if their implementations are different.
There also is a decent chance you’ll have to document your OS, for example if the compiler dynamically links to a system library, or if the OS has FPU support in software, or if the OS tweaks some obscure CPU flags (and of course, you’ll have to document the specific CPU and CPU revision, as those can affect things like constant folding.
Wouldn't that require also including the hash of every dynamic library in the system, the environment variables, configs etc, as all that can affect the resulting file ?
If it isn’t machine checkable, it is a complete waste of time. There are plenty of examples of people at the actual vendor behaving maliciously; often under government orders.
However, there’s a pretty simple way to make sure the bill of materials can be checked by machine:
Require the firmware blob to be reproducibly buildable from source, and mandate distribution of the source code of the firmware.
Doing this doesn’t preclude signing the result of the build and then distributing the signature instead of the signing key.
I’d personally prefer it if vendors also had to distribute signing keys, but that’s a separate question.
You can disagree, and it is unlikely that you'll change my mind.
A nice compromise is what my old Acer laptop does. You can disable the windows public key, and put your own in place if you want to run a CA, or you can just tell it to trust whatever boot loader is currently installed (and lock it to that boot loader, so now it won't boot Windows or even Ubuntu, but it will boot the Linux or BSD you installed). You can also disable the security checks if you want to.
> You can disagree, and it is unlikely that you'll change my mind.
I'm not looking to change your mind, merely pointing out how absolutely pointless is signing if private keys are public. At that point just don't sign, and don't add the silicon/firmware that verifies the signatures - less cost designing/testing/manufacturing/supporting the hardware.
The point of signing the firmware/bootloader is to make it more difficult to employ boot chain exploits. If you don't care that anything that even briefly gains root on your machine (curl|sudo bash) can proceed to permanently backdoor your OS in a way that's undetectable from the inside, just disable signing. /shrug
> A nice compromise is what my old Acer laptop does. You can disable the windows public key, and put your own in place if you want to run a CA, or you can just tell it to trust whatever boot loader is currently installed [...].
Either Microsoft, or the EFI standards body (under MS's direction; can't remember the details here) mandates that users must be able to enroll their own signing keys, at least on x86. I haven't run into PC hardware that doesn't allow you to do this, but then I'm not dealing with a lot of PC hardware regularly. The story is a bit different with Arm EFI - Microsoft tried to ape Apple and make it mandatory to NOT be able to enroll new first-party keys; meanwhile Apple skipped EFI on Arm Macs, and just allowed third-party OS's like they did on Intel. I'm not sure what's the story with signing a third-party OS on the Arm Macs, but I wouldn't like to run one unsigned.
I don't quite understand SBoM. How does this help with something like the Solarwinds attack or a general CI compromise? Or is there another type of attack this helps mitigate or detect?
Their example entry basically just says (very simplified) "Intel both built and supplied the microcode for your Intel CPU". But it says nothing like "Intel used Jenkins version x.y.z which bundled log4j version a.b.c" so you still can't tell if your binary blob was built by a compromised system or not, even once you learn about the potential attack.
It won't tell you about malware. It just means when my customers have questions about well publicised vulnerabilities in open source dependencies, they can audit themselves for the presence of that dependency with a set of standard tooling.
That said, my company is moving towards having SBOMs, but seems to be making them available on request only. Which in my mind defeats the point of having them.
The things you mentioned are not solved by a typical "SBOM" but e.g. CycloneDX has extra fields to record provenance and pedigree and things like in-toto (https://in-toto.io/) or SLSA (https://slsa.dev/) also aim to work in this field.
I've spent the last six months in this field and people will tell you that this or that is an industry best practice or "a standard" but in my experience none of that is true. Everyone is still trying to figure out how best to protect the software supply chain security and things are still very much in flux.
I always thought SBoMs _can_ link to all the dependencies a build used, which user can later securely resolve to all the transitive binaries/dependencies like log4j version a.b.c.
In the SolarWinds case, the attack was carried out on the build system itself, so a hypothetical SBoM published by the compromised build system itself would be fairly useless to their customers.
and how are small solo founder companies making devices going to carry and pay for this burden?
i’m pretty annoyed that stuff like these requirements just makes it into law without getting small companies or startups to have a say in whether they want this or not.
As a co-founder of a tiny company dabbling in hardware as a side-project, yes please!
Half the things we are struggling with are due to upstream hardware vendors throwing a hot mess over the fence. Linux&friends being GPL doesn't help us as much as we hoped, because many important and low-level pieces are quite opaque, closed, under-documented, etc.
Anything to force the vendors to be more transparent is good for us. We already automate our builds, keep extensive notes on what works and what doesn't, etc. Our own SBOM is additive on top of the vendor's stack, so all we're really concerned with is the pieces we wrote ourselves. I don't see it as a burden any more than ensuring proper test coverage, or maintaining a lockfile of dependencies.
you don’t understand. if they make it a law, it will be worse for you, not better, since as a hardware vendor you carry the burden.
you point out the problem we face today: people in open source typically work at big tech and regularly push ideas that end up costing small hw companies a lot of money. for example rust in the kernel. or over complicated frameworks for the sake of security while lots of hardware does not need security. or features that are important for servers and make those default while most hardware is not a server. and so on.
if open source was primarily managed by small company contributors it would look very different.
If it really goes through. It is just adding extra standard step or tool to CI/CD chain. Problem now is that there is not good enough tools, but it is not unsolvable problem when enough people get involved.
As opposed to what, getting a free pass to ignore security because you're small? (And really, it's a "burden" to track what you're shipping? That's all a BoM is; a standardized format to stamp a binary with a list of ingredients.) Do you think mom 'n pop diners should be exempt from health inspectors?
If it has a network connection, it can be used in a botnet. If you mean hardware that is not networked, than yes that seems like the concern is meaningfully smaller.
Yeah, even if it doesn't become a legal requirement, I could imagine companies asking for FSBoM's from their suppliers -- so it ends up being a requirement anyway if you want to compete with big co's.
Maybe a bit off topic, but has anyone had success using LLMs to reverse engineer firmware assembly code? If they aren't going to release it open source, maybe that's the next best thing.
Thanks for this. I've been using Binary Ninja, but may have to try out Ghidra. I was surprised how little I found online about it, none of the major vendors seem to have first class support yet.
I think "[automated] disassembly" has a different implication than reverse-engineering; the latter usually involves more depth in the analysis of the binary, usually including more semantic-level considerations (i.e. this block is meant to do this, or this function is used from these different callsites). The best examples of this type of analysis seem to exist in the security community when going into the detail of zero-days, exploits, etc. I think LLMs either already can or will soon enter that space.
I am extremely interested in this. The utter silence in the literature around this is a bit odd, frankly.
Not LLMs specifically, but transformers more generally, ought to be extremely good at this task.
Another one I wonder about a lot is execution traces. Something that isn't mentioned often enough is that transformer models can do previous-token completion just as easily as next-token completion. So you can train the model on paired program/exectrace (training data is trivial to generate here -- just execute it on a CPU!) and then ask the model to work backwards from a desired machine-state you want to reach.
Can we just do it like oxide does it and make all code that executes to make an appliance function -- including firmware -- all open source? This should be the minimum bar to even start the sbom conversation.
You'd still want a way to trace exactly what versions of what components went into the binary that actually got flashed to your device. I 100% support FOSS firmware, just I think it's orthogonal to this.
I don’t think SBoM is solved by Open Source necessarily. It makes the problem easier maybe, but it doesn’t really solve it.
If you want to use Open Source code in an environment, it is necessary to validate all of the code (or know who validated it), including all the dependencies. The maintainers don’t have any obligation to help you, and it is a tedious job that they might not be interested in (or maybe you are in the policing or defense sectors any the maintainers find your application objectionable).
Okay? Neither hardware nor license type has any relevance to an SBOM. On an SBOM, you list all of your software dependencies, regardless of license.
> but where does the software end and the hardware begin anyways?
Hardware is the physically tangible part of the computer. Software is the instructions that run on it. For those compiling an SBOM, I think that's probably an easily answered question.
so let me rephrase my original comment using more text
I see them "bending over backwards" to protect their right to keep an advantage to use, if they deem it necessary, against any one else.
this is why they go to such lengths to avoid publishing what they consider to be "the secret sauce".
as I see things, that they even came up with the notion of a "software bill of materials" is a bit disingenious from my perspective. already the concept is obscure and lends itself (in my opinion) to shading, hiding, and occulting the software source code and (or) the hardware designs (as appropiate)
finally, I consider that the concept of "SBOM" (which TIL existed) is designed with the intentions I already mentioned: to occult information (anti-open) for the sake of keeping a perceived advantage (pro-centralization)
I'm honestly having trouble understanding your comment. SBOMs are just a list of dependences. Creating such a list is not a new concept, but it is one that is gaining traction recently. This is useful for people who use software, for example, if they learn that a particular dependency has a vulnerability in it, they can quickly determine if they are affected.
I'll give you an example of what they're intended to be useful for:
Let's say you're an organization and you use, say, 500 different pieces of software made by 100 different companies. If you learn that there's a vulnerability in a particular dependency, what do you do? In the past, there was no standard way that vendors communicated this information, so the answer is that you would go through your list of 500 software programs, and email 100 different vendors asking about each one. This is not a good process.
If, instead, everyone provided an SBOM with their software, all you need to do is run a query against whatever inventory management system you're using and you have the answer in seconds.
it seems to me that you're arguing back that SBOMs are just a list, so how can it be a big deal?
my point is an issue about the whole reality of needing SBOMs. clearly they have something to do with supply chain trust. but I think your perspective is focused too closely on the engineering aspects of the problem.
I find that the technical realities (all of which I understand to be, in the end, some kind of engineering decision) that motivate having to share vulnerabilities of all components without revealing their designs sources (of either software, hardware, both, or neither) to be morally dubious.
so keeping in mind that hardware, by this point, is stored as code so it's code, and software also has a source code, I find it necessary by this point to open source all the things!!1!
Open sourcing doesn't alleviate the need for an SBOM. Regardless of source availability or license type, it shouldn't be a lengthy engineering exercise to determine what software components an organization is running every time a CVE is published. The people doing patch management at large organizations are typically not software engineers, and even if they are, they don't have time to read source code. The questions they are trying to answer should be able to be answered programmatically in seconds. This isn't an engineering problem, it's an operational problem.
the technical problem is really essentially a matter of trust, blockchains solved this problem in the technical sense.
the opreations of large organization are typically private property... this is a touchy issue because technology corporations are essentially the government by this point
I guess I should be glad this isn't really my problem, I'm just worried about the public and political consequences of what I see as potentially dangerous mistakes being made on the idelogical level
You're misunderstanding the factors driving this initiative.
This isn't an attempt to address the problem of trust. Software can be found to have vulnerabilities even if they are highly trustworthy, and almost all vulnerabilities in commercially used software are accidental. The organizations asking for SBOMs (this is currently receiving a lot of attention due to changes the US government has made to require them for their internal use) have other mechanisms to establish trust with their software vendors.
The straw that broke the camel's back on this issue was CVE-2021-44228 -- which was a vulnerability in open source software. If you missed that debacle -- the problem wasn't that people distrust Apache or software that use Log4j. The problem was that people didn't know where all it was installed.
This was because it isn't currently a standard for software developers to provide a list of all of their dependencies, regardless of whether they're open source or not. This isn't because they are untrustworthy. It just simply isn't standard practice. SBOMs are an attempt to standardize such a list.
A blockchain isn't necessary here -- nobody is trying to lie about what version of Log4j they're depending on in a piece of software they're selling.
you're ignoring my point, well.. not exactly "ignoring", more like explaining how and why the issue I was critizicing is completely irrelevant from where you stand
however, thanks to all this dialogue, do see where my own critizicism goes amiss; and for that, thanks a lot.
If you sell software to the government you do. Which happens to be the biggest purchaser of software and touches others by proxy. E.G. you might provide software to Boeing who sells to the government, therefore Boeing has a legal obligation and requires all its suppliers to provide SBoMs. Or Dell, or HPE, or Cisco or literally every software / service provider.
Funny. So do I. Military specs are governed through something call MIL-STD, not through NIST unless it directly references it. For instance BSSC 2005 gives guidelines on coding standards for Java, but no one follows them.
If you were a contractor then you would know all specifications are waived unless specifically designated by your MOL and documented in GSA/SOW.
So here is one of these internet know it all's that doesn't know anything.
They're notifying companies that do regulated work to check if their compliance obligations have changed. It's not an address to the general public telling us that our gadgets will be different now.
Knowing know which version of a file made it into a binary still doesn't really help you, though. The compiler used (if any), the version of the compiler and linker, and even the settings / flags used affect the output and -- in some cases -- could convert an otherwise secure program into something exploitable.
A Software BoM sounds like a "first step" towards documenting a supply chain, but I'm not sure it's in the right direction.
This feels like this might actually be a use-case for a blockchain or a Merkle Tree.
Consider: A file exists in a git repository under a hash, which theoretically (excluding hash collisions) uniquely identifies a file. Embed the file hashes in the executable along with a repository URL and you essentially know which files were used to build a file. Sign the executable to ensure it's not tampered with, then upload the hash of the executable to a block chain.
If your executable is a compiler, then when someone else builds an executable then they can embed the hash of the compiler into the executable to link the binary back to the specific compiler build that made the binary. The compiler could even include into the binary the flags used to modify the compiler behavior.