Hacker News new | past | comments | ask | show | jobs | submit login
Deterministic, bit-identical and/or verifiable Linux builds (bugzilla.mozilla.org)
106 points by zz1 on July 11, 2014 | hide | past | favorite | 51 comments



I filed the linked bug and am the technical owner of Firefox's build system.

There were efforts made and discussions outside of the linked bug. To say "nothing" was done is just not true.

It would be more accurate to say that we just can't justify working on this right now because the timing isn't right and it's high cost for perceived low reward. The time of everyone involved to implement this would be better spent on improvements that benefit the general Firefox population. Some of those improvements include overhauling Firefox's build automation to better support things like building with Docker. That lays the groundwork for (easier) deterministic builds in the future. Even then, I'm not sure if this will happen. Brendan's post called on the larger community to make requests of Mozilla. That front has been surprisingly quiet. If you really want this, I would suggest making noise on the mozilla.org domain. Even better, contribute some patches, like the Tor Project has done: I will happily review them! #build on irc.mozilla.org.


Sorry if I wrote that nothing was done. What I really meant was that no change happened, and even the discussion for the bug at a first glance doesn't let understand if there is a clear path toward deterministic builds.

What would it be "making noise on the mozilla.org domain"? Must say, though, I am kind of saddened to see that this discussion was upvoted 80 times while the bug is still at 6.


The path towards deterministic builds is definitely not clear. As many in this thread have pointed out, it's a difficult technical problem. The difficulties are multiplied by a project at Firefox's scale.

Further complicating matters is our platform breakdown. The majority of Firefox users are on Windows. Deterministic builds on Windows are very painful. And that's before you figure PGO into the mix. Tor works around this by compiling Firefox with an open source toolchain and doesn't use PGO. But that's a non-starter for us because choosing an open source toolchain over Microsoft's would result in performance degradations for our users. Believe me, if we could ship a Windows and Mac Firefox built with 100% open source to no detriment to our users, we would. There's work to get Firefox building with Clang on Windows (but only for doing ASAN and static analysis, not for shipping to users). That gets us one step closer.

All that being said, there has been exploratory talk lately of serving segments of our user base with specialized Firefox builds. e.g. a build with developer tools front and center that caters to the web development community. If that ever happens, I imagine a deterministically-built Firefox with things like Tor built in could be on the table. The way you can make that happen is to direct noise directly at the Mozilla community. Send a well-crafted email to firefox-dev (https://mail.mozilla.org/listinfo/firefox-dev) explaining your position. Anticipate that people will likely reply by asking you to prioritize this against existing goals, such as shipping 64-bit Firefox on Windows and shipping multi-process Firefox. We don't have nearly unlimited resources like some of the other browser vendors, so we can't just do everything. Again, I implore people to directly contribute to Mozilla any way they can. https://www.mozilla.org/contribute/


I went to the Debian talk on reproducible builds at Fosdem this year which was interesting, but there is still a fair amount of work to do to make this more routine, but the pieces are being worked on and this will get easier as it becomes more normal.


I'm in casino gaming. We have to send our source and tools to regulatory test labs so they can (hopefully) independently generate the same binary as what we are delivering. Given our tools (C++ and Windows), 'binary reproducibility'[1] is impossible, but we've got a workaround. We do our release builds on a VirtualBox that's all tooled up. When it comes time to deliver to the lab, we export the entire box (with source already synced) as an .ova. Part of our build pipeline is a tool that strips things like timestamps and paths from the PE files. Some people don't go to all this trouble and instead use tools like Zynamics BinDiff to explain away the diffs.

[1]https://www.google.com/?gws_rd=ssl#q=binary+reproducibility


What are the companies that provide this service (reproducing builds)? I haven't heard of this, but sounds interesting.

Depending on how much effort you're willing to put in, even if you use C++ and Windows, you can still write a program to parse the executable and zero out timestamps and other non-deterministic data. That is actually being done in a BitCoin-related program for Windows I believe.

How do you generate and verify the VirtualBox? If you send the image over to the test lab, then the obvious thing to do is for someone to attack your VirtualBox, and you have the same problem all over again, just at a different level.


For jurisdictions that don't have their own state-run labs (so not NV, NJ, PA, etc.) everybody uses one or a mix of GLI[1], BMM[2], and Eclipse[3] Note: I'm only familiar with US gaming.

We do have a tool to zero these parts of the executable files out, but in our testing we still had unexplainable differences unless we were on the same machine working from the same sync.

The VirtualBox was generated once (installed Windows, Visual Studio, .NET, some others) and we just continue to use the same base .ova.

The package has to be sent to the lab on physical media where it gets loaded onto an offline machine that we've supplied.

[1]http://www.gaminglabs.com/ [2]http://www.bmm.com/ [3]http://www.eclipsetesting.com/


This works for your goal (being able to reproduce the binary build), but in Mozilla's case it's slightly different. Being FLOSS software, Mozilla's goal is that end-users can completely reproduce the builds from source. This includes dependencies, toolchains, AND the build environment. In this scenario, accepting a pre-build binary VM would not be acceptable, since it defeats the spirit of FLOSS.


I used to work in the same industry. We used linux and gcc, so we could, and did, produce fully deterministic builds. Actually the output was fully deterministic disk images.

I did one iteration of the build system, mostly making it such that any host could build it deterministically. This was years ago so it was just chroot that started with a skeleton + GCC and procedurally built the things it needed to build the outputs. Was fairly straight forward, just an extremely short patch here and there, a 1000 line Xorg Makefile for staging Xorg builds. If I was doing it again I'd consider reusing a package manager, but each components Makefile was pretty concise. My trusty sidekick was a script that xxd'd two files into pipes that it opened using vimdiff.

Build took an hour or so, however.


So the regulators have to use the provided virtual machine and tools to build the source, and verify that the resulting binary is the same as provided by your company?

How do they confirm that the toolchain has not been messed with? Surely they can't binary-check the whole OS/compiler/linker/other software in the VM?


Exporting virtual build machine as part of release is good practice anyway.


On January 2014 Brendan Eich [1] called out for organization to build up a system to verify Firefox builds in order to secure the browser can't be used as an attack vector being distributed with some malicious feature added to what's in the source code.

Six months later nothing is done, that is because Firefox build are not deterministic yet. If you think this is an important issue, please vote this bug.

Edit: [1] https://brendaneich.com/2014/01/trust-but-verify/


I'm curious on the theoretical basis of this effort. I'm reminded of "On Trusting Trust." Simply put, not at all an easy problem to try and tackle.

No, I'm not against trying. Just going from your thing, I'm not sure what is being aimed at. Specifically, would a "deterministic" build really help much?

edit: I am perusing https://blog.torproject.org/blog/deterministic-builds-part-o... and https://blog.torproject.org/blog/deterministic-builds-part-t... Good reads so far.


Also this is an interesting link, Debian's ReproducibleBuilds effort:

https://wiki.debian.org/ReproducibleBuilds#Why_do_we_want_re...


Please have a look at David A. Wheeler’s page on Trusting trust [1], including his 2009 PhD dissertation [2], where he clearly demonstrates that it is possible to have trusted (not in the MS sense...) computers (I think).

You may also be interested in 'Countering "Trusting Trust"' on Schneier's website [3], which discusses a 2006 paper, also by Wheeler.

[1] http://www.dwheeler.com/trusting-trust/ [2] http://www.dwheeler.com/trusting-trust/dissertation/html/whe... [3] https://www.schneier.com/blog/archives/2006/01/countering_tr...


A C compiler is not a single executable. If you take gcc, compiling C code involves a preprocessor, a compiler, an assembler and a linker (and that's a.simplified view). To make matters worse, the assembler and the linker are not even part of the GCC source code, they're a separate project. That would likely make the process of DDC significantly more difficult than if the compiler was actually a single executable. Also, there are other things involved in the whole process, like the CRT static objects, dynamic libraries and the dynamic loader. That's many more items to trust. You could even add the kernel to the list.


My memory on that was that it let you know whether you could trust your compiler. I couldn't remember if it extended ot the rest of the tool chain. Nor did I remember if it really hinged on deterministic builds. I'll have to retry it.


This is a good talk regarding "Trusting Trust" for binary toolchains http://youtu.be/QogdeTy7cDc


Strictly speaking, it lets you know that your compiler binary matches its source. You can then read the source to decide if you trust the compiler (and others can audit it, can audit binaries generated from it, &c, &c). At which point, as raving-richard says, you can start to trust that your other utilities match their source as well. Which source also should be audited, &c, &c.


Right, my point is having deterministic builds of firefox aren't even really needed for this. If you trust your compiler, you trust your compiler. What does it matter if you have a non-deterministic build of a utility. You trust what is non-deterministically building it.

As this stands, if you deterministically build firefox, you just know that if your toolchain is corrupted, it is consistent. :)

Right?


If you trust your compiler you can verify that your build is the safely based on the source that you have. If the build is deterministic then you could verify that the binary being distributed to the masses isn't compromised by building the same file yourself and seeing that it is the same.


Right, and my question is essentially if this is "putting the cart before the horse." Do Mozilla have efforts in place to establish trust of their compilers? (I expanded on my response below. I really wish I knew the correct way to "merge" conversation trees here. Is there a good protocol for that?)


esrauch has it, but it's a point worth stressing: it gives many more people the opportunity to notice any discrepancy in common tool chains, as well as adding assurance for those who aren't building their own.


I can see how it helps. I'm still curious by how much. Consider, if everyone's common tool chain is untrustworthy, then this solves nothing.

This is why the "docker" idea worries me. It is basically counter productive. Just moves the "trust" to a whole harder thing to verify.

And the reason I was focusing on the compiler point, is to my knowledge nobody has established that the common compilers are trustworthy. At least not the ones in use at large. Until that happens, we're back to my first point. Which is to say that we may not be trustworthy.

Again, to be clear, I see there is benefit to knowing that we are all of the same trustworthiness. Having "reproducible" builds that don't match is an indication that something is definitely wrong. Definitely a worthy effort. Just, having reproducible builds that do match doesn't really tell you much about the trustworthiness of the application. Specifically, it only tells you that it is as trustworthy as another build. (Similar to the boolean logic that trips folks up all of the time that False \implies True is true, as is False \implies False.)

Unless, of course, I'm still misunderstanding something.


"Consider, if everyone's common tool chain is untrustworthy, then this solves nothing."

I don't quite agree - it more clearly establishes what tool chain people should be auditing!


I had to think about this for more than a second. :) I think I see what you mean. Specifically, if N = 2, this doesn't do much. However, as soon as you have more folks that agree, then the first time someone disagrees, you have a good spot for auditing. Right?

I still can't see how the docker idea helps this purpose right off.


If you trust your compiler, then you can read the source code and then trust the compiled executable. If you are worried, then you can build the rest of your tool chain from source...


My point is that that is the heart of the trust. You have to establish that before you can use any "deterministic" builds of your utilities to establish their trust.

Right?


On deterministic builds you may also find interesting this one: http://www.chromium.org/developers/testing/isolated-testing/...


That is about getting faster, not more secure. Right?

I mean, I get that it is kind of nice to be able to verify that multiple sets of folks can build the same thing and compare results. I'm curious if there are any theoretical thoughts on how much this helps. For instance, it does not guarantee that there are not malicious changes in the codebase. Which is far more likely to be a problem, I would think.

To that end, the entire browser war is ultimately counter security. As the vendors add more and more features, there are more and more places for malicious changes to hide. Not just in "mistakes," but in features that could be potentially misused. I feel that "deterministic builds" doesn't stem this that much. I'd love to be shown how/why I'm wrong.


Depending on the browser component model, could you hash individual binary components? Over time, "base" components could stabilize in the same manner as long-term-support Linux kernels, with surgical patches for security fixes. I don't know how practical this would be with the component models for Firefox and Chromium.


Sounds somewhat reasonable; though, some long term support Linux installations were vulnerable to Heartbleed.

And, especially with how far reaching some of the features of modern browsers are, the surface area for attacks is growing rather large.


Even if you hate Bitcoin for other reasons, this is one reason to appreciate it. Gitian, the software used by Tor for deterministic builds (of their build of Firefox especially), was originally written by Bitcoin developers. Which makes sense, you want to make sure your money is secure.

Good things come out of things you might hate.


Deterministic builds are pretty neat. I think the second equally important piece is a a Web of Trust full of people willing to reproduce the build and sign off on the hashes.

I was able to reproduce the sha512sum of the Bitcoin back when 0.9.0 came out without too much trouble, but it definitely took a couple hours to get it all working.

I feel a bit bad I didn't take the next step and attach my digital signature signifying that I could reproduce it. There are only a few people other than Gavin who go to the trouble of signing off on the hashes.

I wonder if Docker could be used to speed up the overall process and make builds more accessible. As I recall, the current scripts setup a single-core KVM which definitely slowed things down.


Docker could speed up parts. But, unless I misunderstand what you mean it wouldn't really help the trust aspect. You'd just be shifting your trust to the docker pieces. (That is, then the goal shifts to "can anyone reproduce the docker container?") Right?


Work has already started on supporting Docker images in Gitian Builder. https://github.com/devrandom/gitian-builder/issues/53


Bitcoin has tons of reasons to appreciate it, especially technically.

That doesn't by itself make it a good idea to actually use, of course, but you'd have to be blind to miss the wonder of what Bitcoin brought to technology.


Wait, Bitcoin is something to be hated?


Didn't you know? It's a thing on HN.


Before everyone gets up in arms about Mozilla not working on this: As I wrote the last time this came up, deterministic builds are a nice thing, but they're only a small piece of the puzzle of protecting users from the state-sponsored malicious actors. Indeed, it seems to me that messing with builds would be one of the more difficult ways for the NSA to pwn Firefox users.

https://news.ycombinator.com/item?id=7045605


Or use Gentoo, that's what I do. You can verify hashes/signatures on the Firefox source archive and audit the source code if necessary before compiling.

That was only half serious - I know that are valid use cases for people to prefer using binary distros. However I think this particular issue is a good example why IMO even binary distros need to provide a convenient option to locally build any package for security conscious users.


That sounds tangential. The point is if two people build the same thing, they should be able to compare their builds to see if they are truly the same. If not, the argument is that one of them has a "tampered" environment.

In other words, if you don't know your compiled binary is the same as the distributed binary, you have no reason to think yours does not have a vulnerability added by the toolchain.

Unless I'm the one that is misunderstanding, of course. :)


Well it's a solution to the same underlying problem - that by running binaries compiled by a 3rd party you trust that they aren't adding in code to compromise your privacy (voluntarily or not). If you compile the application from source yourself you don't need that leap of faith - no need to compare identical binaries or have deterministic builds (which is not trivial as the bug report demonstrates).


I'm not sure your solution solves that. If Firefox has vulnerabilities in the source right now, you do little to protect yourself by compiling on your own. Even if you can verify that you and someone else produce the same binary, they could just both be vulnerable.

In fact, if you compile it yourself, unless you can verify the compile against a "known good" one, then you can't even be sure that your local toolchain hasn't been compromised. (I mean, sure, if you were a perfect auditor of your entire toolchain, then you could have some confidence here. You have to be perfect, though.)

Consider, you do a compile of Firefox and it is different than the one for download. Why? As things stand now, you don't know. And that is the problem.


> If Firefox has vulnerabilities in the source right now, you do little to protect yourself by compiling on your own.

You do more to protect yourself than taking the same vulnerable source and compiling it with Mozilla's "reproducible build chain".

If the source itself is corrupt then having a verified build of malicious source is completely useless.

With Gentoo you can verify the source itself matches the "trusted" upstream source and then build it with your own trustworthy build chain.

And before you go "what if your build chain isn't trustworthy huh????" think about it a little further... if your own local build chain can't be trusted you're already screwed even before you download anything from mozilla.org, just as you'd be if you downloaded a "bit verified" binary from mozilla to run on your already-pwned local operating system.


No you don't. You do nothing to protect yourself from vulnerabilities in the code by compiling it yourself. Literally nothing.

You do protect yourself from vulnerabilities in their toolchain. And this is where the effort makes sense. If there are differences in the builds, then you can at least suspect one of you has a tampered environment. Right now, you have no way of knowing that one way or the other. You just have the joy of having done your own build.

My main question is still just one of magnitude. Consider, I have not had a wreck or other car mishap in 20 years. I could conclude that seatbelts, then, have not increased my safety really. I am not trying to make that claim, as I feel it is false. So, my question here is essentially, how much safer would this really make things? (Or trustworthy, if you'd rather that term.)


Fellow gentoo user here. Gentoo does not protect you from Trusting Trust attacks (mentioned above). But then again neither do reproducible builds, because you still have to trust the original compiler. Reducing the variance (to zero in this case) by using a deterministic build system DOES protect against compromise of everything except for the original build environment. Yes, that makes the original build env a target for attacks, but if we honestly believe we have a "trustable" reference build environment then those attacks are also exceptionally hard to pull off.


The question is how to provide those same benefits to most people in the world. Most of them are not in a position to compile their own software, for various reasons ranging from (reasonably!) not wanting to spend that much time on it to using an OS where it's even more of a pain than on Linux (e.g. Windows, or Android, or iOS).

The only sane way to help these people trust their software is to enable meaningful third-party audits of said software. And that requires that the auditor be auditing exactly the same thing as the user is using.


deterministic doesnt really mean trust per se to me. it means reliable.

if you dont know what you're getting every time its hard to be any reliable.

of course, if you're reliable, its more trustable.


Repeatable, "reliable" only in the sense that you can rely on the outcome.


here's the difference:

- make the build system reproducable, so that every build is exactly the same binary, no matter who runs it. that's "easy". But you don't know why you get that exact binary

- make the build system verifiable, or the resulting binary verifiable so that you know exactly why you get that binary. This is hard.

The first one is repeatable, reliable.

The second one is trustworthy, verifiable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: