Hacker News new | past | comments | ask | show | jobs | submit login
Zizmor would have caught the Ultralytics workflow vulnerability (yossarian.net)
81 points by campuscodi 5 months ago | hide | past | favorite | 23 comments



This post has left me wondering: what is zizmor? What is ultralytics? Are these words actually real or is someone having a stroke?

Not all nerds know all projects so I decided to educate myself and followed OP’s links to learn about Ultralytics:

> Ultralytics YOLO11 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility.

Ultralytics’ readme doesn’t explain what ultralytics is or does. Thankfully Zizmor’s readme describes itself clearly:

> zizmor is a static analysis tool for GitHub Actions. It can find many common security issues in typical GitHub Actions CI/CD setups.

This isn’t a critique on OP: I enjoyed reading about the vulnerability(ies!) you found and I learned a lot. I’m just generally frustrated that so many readme files on GitHub fail to describe what the project actually does, Ultralytics being just one example.

Have fun and keep hacking


I wonder if Zizmor has anything to do with this NYC local notable: https://en.wikipedia.org/wiki/Jonathan_Zizmor


I named it explicitly for Dr. Zizmor :-)

https://github.com/woodruffw/zizmor#the-name


I was legitimately wondering if Dr Zizmor made a pivot into cybersecurity


YOLO is an ML architecture used for object detection and recognition and Ultralytics develops a version of YOLO.


Ultralytics' README gives me a headache to read lol, for similar reasons you gave. But then the package is called YOLO and the author abbreviated state-of-the-art to SOTA (wat…). It's exactly the kind of "modern" GitHub repository that I like to stay away from lol. My only critique of this README was that it didn't have enough emojis. If you want to truly YOLO, may as well fill half your text with emojis.

This hilariously tech bro optimistic auto-response (made by a bot) from the linked issue (https://github.com/ultralytics/ultralytics/issues/18027#issu...) also gave me a laugh in how out of touch it was with what the issue was.


(Author of this post.)

If you’re interested in how this went down, the timeline section[1] in particular is worth jumping to: my key takeaway is that this vulnerability was reintroduced, and that there’s only limited evidence that the Ultralytics team have done a full revocation and rotation of all accounts and credentials that the attacker may have had access to.

Given that, it’s not inconceivable that a third round of backdoored packages will occur. I would recommend that people exercise extreme caution when installing the current versions; most users would probably be best served by pinning to an older version from before any indicators of compromise.

[1]: https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-inj...


One quite annoying element is that as a third party you cannot access the attestations of the deleted releases any more. I really wanted to see if the attestations would help here to figure out what happened. But maybe I’m just not informed enough about where to look.

Another element here is that the releases seemingly were deleted and re-created? I thought that was prevented by PyPI?


The attestations are checked into the public transparency log, so they’re still accessible — that’s how I did a decent amount of the triage in the write up. You can find them in the write up by searching for “Sigstore” (I would direct link them, but I’m on mobile).

> Another element here is that the releases seemingly were deleted and re-created? I thought that was prevented by PyPI?

Hmm, where do you see this? The release history on PyPI doesn’t show any recreations[1].

[1]: https://pypi.org/project/ultralytics/


> You can find them in the write up by searching for “Sigstore” (I would direct link them, but I’m on mobile).

Yeah, I know they are in sigstore, I just did not know how to find them. Is there an interface for this I missed?

> Hmm, where do you see this? The release history on PyPI doesn’t show any recreations[1].

Then I completely misunderstood what happened. Was this in fact completely made up releases that were not even intended to be triggered? Eg: a bot released .41 without there being an intent of being an actual .41 release? I thought that UltralyticsAssistant was the developer, not the attacker. Do they also control that thing?


> Is there an interface for this I missed?

That would be search.sigstore.dev, unless I'm misunderstanding what you mean.

> Was this in fact completely made up releases that were not even intended to be triggered? Eg: a bot released .41 without there being an intent of being an actual .41 release? I thought that UltralyticsAssistant was the developer, not the attacker. Do they also control that thing?

.41 and .42 were triggered directly from the repository. One was triggered by the UltralyticsAssistant account and included a human bypass, which strongly suggests that the attacker controlled (and maybe still controls) that bot account.

The last two compromised releases were published directly via API token, not via the source repo, which strongly suggests that the attacker either exfil’d an old API token from CI/CD or that they’re in control of the developer’s account on PyPI. Those ones don’t have attestations, while the first two releases do (two each, one per dist per release).


> .41 and .42 were triggered directly from the repository. One was triggered by the UltralyticsAssistant account and included a human bypass, which strongly suggests that the attacker controlled (and maybe still controls) that bot account.

Ah, but if they controlled the bot then didn't they have other problems too? If that is the case, then disregard my comment. I was under the impression that this was not the attacker.

> That would be search.sigstore.dev, unless I'm misunderstanding what you mean.

No, that's it in theory I suppose. I did try this but when I used the commit I thought triggered the release (cb260c243ffa3e0cc84820095cd88be2f5db86ca) I did not see it show up.


> Ah, but if they controlled the bot then didn't they have other problems too?

Yep — my theory is that this all starts with the insecure trigger + template injection, and that the attacker exfil’d the bot’s PAT and stale PyPI API token at that point. The first round of attacks used the bot PAT and cache poisoning, and then the attacker pivoted to the PyPI token once the first vector was closed off.

> I used the commit I thought triggered the release (cb260c243ffa3e0cc84820095cd88be2f5db86ca) I did not see it show up

I think I know what’s happening there: that search UI only indexes by commit for attestations produced by gitsign, not every attestation containing a commit. I used it by finding the entry IDs from the release action logs on GitHub Actions, but if/when those are flushed someone who doesn’t already know them will need to seek the log in order to find the attestations.

That’s not ideal; the search.sigstore.dev service would ideally have more indices like crt.sh does.


Why has CI for open-source projects become so difficult to secure? Where did we, collectively, go wrong?

I suppose, it's probably some combination of: CI is configured in-band in the repo, PRs are potentially untrusted, CI uses the latest state of config on a potentially untrusted branch, we still want CI on untrusted branches, CI needs to run arbitrary code, CI has access to secrets and privileged operations.

Maybe it's too many degrees-of-freedom creating too much surface area. Maybe we could get by with a much more limited subset, at least by default.

I've been doing CI stuff in my last two day jobs. In contrast, we worked only on private repos with private collaborators, and we explicitly designated CI as trusted.


> Maybe it's too many degrees-of-freedom creating too much surface area.

I think this is essentially it: there's extraordinary demand for "publicly dispatchable and yet safe" CI/CD, despite those requirements being fundamentally in tension with each other.

All things considered, I don't think GitHub has done the worst job here: the security model for GitHub Actions is mostly intuitive, so long as you stick to triggers like `push`, `pull_request`, etc. The problems only really begin when people begin to use triggers that (IMO) GitHub should never have added in the first place, like `pull_request_target` -- those triggers break the basic "in repo privileged, out of repo unprivileged" security assumption and cause the kinds of problems we're seeing here.


I wonder about an alternative history where the default feature set is much smaller and much safer, and everything else is opt-in behind a flag, and the flags are prefixed with "unsafe_" or something. That would hopefully encourage people to look up the "unsafe_allow_foobar" docs before using it.


It's a web of danger for sure. Configuring CI in-repo is popular (especially in the Gitlab world) and it's admittedly a low-friction way to at least get people to use config control for CI (or use CI for builds at all). I think the number of degrees of freedom is really a footgun.

I remember early Gitlab runner use when I had a (seemingly) standard build for a docker image. There wasn't any obvious standard way to do that. There were recommendations for dind, just giving shell access, etc. There's so much customization that it's hard to decide what's safe for a protected/main branch vs. user branches.

I don't have a solution. But I think it would be better if, by default, CI engines were a lot less configurable and forced users to adjust their repo and build to match some standard configurations, like:

- Run `make` in a Debian docker image and extract this binary file/.deb after installing some apt packages

- Run docker build . and push the image somewhere

- Run go build in a standard golang container

And really made you dance a little more to do things like "just run this bash script in the repo". Restrict those kinds of builds to protected branches/special setups.

Having the CI config in the same source control tree is dangerous and hard to secure. It would probably be better to have some kind of headless branch like Github pages that is just for CI config.


I think usually people like to blame GitHub Action's design, but this repository here seems to have not done the bare minimum in securing itself and more focused on producing a "state-of-the-art (SOTA)" "YOLO" model instead.

There are just a lot of things wrong with just format.yml itself. It honestly seems kind of weird that it needs commit access to push a new commit under the PR author's name/email just to format their code. I personally would find this kind of rude if I'm the PR author as I sign all my Git commits and a bot masquerading as me in submitting a Git commit is not appreciated even for something like code formatting. And of course the author of format.yml didn't seem to know the different between `pull_request` and `pull_request_target` and just threw both in.

I also think these days people go way overboard in CI/CD because things that are automated are obviously better right? I personally do not like any CI pipeline that has the capability to directly commit to the main Git branch without review/signoff (which [this commit](https://github.com/ultralytics/ultralytics/commit/cb260c243f...) did which removed the author check). Things like deploying to PyPI should be more than just a single commit and involves a human. Yes, it introduces a piece of friction to the process, but if you are maintaining a big piece of open source software, a release you made is going to be deployed to lots of people's computers so a little bit of annoyance on the maintainer's side is a small price to pay to make sure you get everything right.

I guess I'm weird. I maintain an OSS macOS app and I see other similar apps just upload their private signing keys to GitHub and just let the CI sign everything for them but I still sign my releases offline and never upload my keys to a public service.

What I'm saying is I don't think we want CI to do everything for us, especially for powerful actions (e.g. making a release) that do not need human approval, and if you do, you should think really hard about whether that's something desired and whether you want to spend the extra mental energy to think about all the security ramifications etc which might just offset the little bits of time you saved.


GitHub doesn't really seem to prioritise security. I just reported a nasty way to smuggle code[0] into Actions pipelines to them and got a classic "expected behaviour WONTFIX” response. It's exactly the kind of sneaky behaviour that the Jia Tans out there would use in an attack.

[0] (see end of) https://cedwards.xyz/github-actions-are-an-impending-securit...


Wow, I had no clue about how many ways it was possible to get burned with Actions - as an ME nerd, I've set up a few CI/CD workflows, and if I recall correctly, while I was reading through the documentation for GitHub Actions (circa 2022) there wasn't any mention of cybersecurity best practices in the general docs. Is that generally considered best practice, or at least acceptable?

I'm not a programmer by trade- I generally write one-off or two-off code, but that's changing as I get deeper into simulation land. For me, reading the entirety of the docs is something that generally happens only when I'm troubleshooting something or an LLM dragged me significantly further than my understanding and I have to go learn how a library or API works.


Thank you Doctor Zizmor!


Spotted the straphanger


[flagged]


The vulnerability has nothing to do with bash. It’s a template injection in GitHub Actions, which bypasses the interpolation of any shell or interpreter.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: