Actually, this xz-utils was a sort of reproducible build issue. But the twist it is wasn't the binary not built reproducibly. It was the tar ball. The natural assumption is it just reflected the public git repository. It didn't.
Debian's response is looking to be mandating the source must come from the git repository, not a tar ball. And it will be done using a script. And the script must produce the same output every time, ie be reproducible. Currently Debian's idea of "reproducible" means it reflects the source Debian distributes. This change means it will reproducibly reflect the upstream sources. That doesn't mean it will be the same - just that it's derived in a reproducible way.
As for trusting the person who gave you the hash: that's not what it hinges on. It hinges on the majority installing a binary from a distro. That binary can be produced reproducibly, by an automated build system. Even now, Debian's binaries aren't created just once by Debian. There are Debian builders all around the planet creating the binary, and verifying it matches what's being distributed.
Thus the odds of the binary you are running not being derived from the source are vanishingly low.
So if something is hidden in the source, a reproducible build will give you the confidence to believe that the source is fully vetted and clean.
I see what you’re saying, but I don’t buy that reproducible builds actually solve anything, especially long term. As this whole xz thing has shown us, lots of things fly under the radar if the circumstances are right. This kind of thing will absolutely happen again. In the future it may not even require someone usurping an existing library, it could be a useful library created entirely by the hacker for the express purpose of infiltration a decade later.
Reproducible builds are a placebo. You must still assume there are no bad actors anywhere in the supply chain, or that none are capable of hiding anything that can be reproducibly built, and we can no longer afford to make that assumption.
A reproducible vulnerability is still a vulnerability.
You're focusing on the wrong issue. Just because reproducible builds don't solve all avenues of attack doesn't mean they're worthless. No, a reproducible build does not give you any confidence about the quality of the source. It gives you confidence that you're actually looking at the correct source when your build hash matches the published hash, nothing more.
A window that can be smashed in is still a vulnerability, so there is no value in people locking their front doors.
> I see what you’re saying, but I don’t buy that reproducible builds actually solve anything,
You're right. They don't solve anything, if you restrict "anything" to mean stops exploits entering the system. What reproducible builds do is increase visibility.
They do that by ensuring everybody gets the same build, and by ensuring everybody has the source used to do the build, and by providing a reliable link in the audit trail that leads to when the vulnerability was created and who did it. They make it possible to automatically verify all that happened without anybody having to lift a finger, and it's done in a way that can't be subverted as Jia Tan did to the xz utils source.
Ensuring everyone has the same build means you can't just target a few people, everyone is going to get your code. That in turn means there will be many eyes like Andres looking at your work. Once one of them notice something, one of them is likely to trace it back to the identity that added it. Thus all these things combine to have one overall effect - they ensure the problem and it's cause are visible to many eyes. Those eyes can cooperate and combine their efforts to track down the issue, because they are all are absolutely guaranteed they are looking at the same thing.
If you don't think increasing visibility is a powerful technique for addressing problems in software then you have a lot to learn about software engineering. There are software engineers out their who have devoted their entire professional lives to increasing visibility. Without them creating things that merely report and preserve information like telemetry and logging systems software, would be nowhere as reliable as it is now. Nonetheless, if the point you are making is when we shine a little sunlight into a dark corner, the sunlight itself does nothing to whatever we find there, you're right. Fixing whatever the sunlight / log / telemetry revealed is a separate problem.
You aren't correct in thinking visibility doesn't reduce these sorts of attacks. By making the attack visible to more people, it increases the chance it will be notice on any given day, thus reducing it's lifetime and making it less useful. Worse, if the vulnerability was maliciously added the best outcome for the identity is what happened here - the identity was burned, all its contributions where also burned and so a few years of work went down the drain. At worst someone is going to end up in jail, or dead if they live in authoritarian state. The effect is not too different from a adding surveillance cameras in self-checkouts. Their mere presence makes honesty look like a much better policy.
Debian's response is looking to be mandating the source must come from the git repository, not a tar ball. And it will be done using a script. And the script must produce the same output every time, ie be reproducible. Currently Debian's idea of "reproducible" means it reflects the source Debian distributes. This change means it will reproducibly reflect the upstream sources. That doesn't mean it will be the same - just that it's derived in a reproducible way.
As for trusting the person who gave you the hash: that's not what it hinges on. It hinges on the majority installing a binary from a distro. That binary can be produced reproducibly, by an automated build system. Even now, Debian's binaries aren't created just once by Debian. There are Debian builders all around the planet creating the binary, and verifying it matches what's being distributed.
Thus the odds of the binary you are running not being derived from the source are vanishingly low.