For example, you can find a LOT of copyrighted font files that were committed somewhere in GitHub, and then removed in a later commit once they realized they'd accidentally uploaded a copyrighted file.
But they're still always there in the history, effortless to download.
I'm not really sure what to make of that. I don't think it would really count as removal in court... but it seems rare and complex enough that it's not worth bringing up?
If GitHub received a DMCA takedown notice, they would be obligated to takedown the copies of the fonts listed in the notice, including if old copies were listed. I'm unsure if they could say "all releases before X" or would need to link each one.
If the copyright owners tried to sue the project for copyright infringement, IANAL but I would assume that the removal from head would show an attempt to correct the mistake and limit liability.
If the copyright holder sued an individual I imagine it would matter if they were mirroring the repo or just intentionally downloaded the copyrighted files for personal use.
On that note, does git as a protocol even have a clean mechanism for redacting history like this? If someone were to press this to the logical extreme, how could a developer most cleanly excise violating history from a repo using current tooling?
There is a way, its not very clean, git filter-branch and you will have to force push all branches, which is fun with large teams.
Unfortunately in larger repos with long histories its extremely slow, and uses a lot of IO. I used it previously to clean up large binaries that were included early on in a repo's history, making it take up way more space than needed.
I assume you would have to revert to the parent of the offending commit, cherry-pick the non-offending code, commit, then rebase the entire master branch on that new commit.
Then you'd have to repeat the process for all forks and branches. It'd be a huge pain, but I think it's doable.
I've never tried something like this, though, so there might be some complications.
For example, you can find a LOT of copyrighted font files that were committed somewhere in GitHub, and then removed in a later commit once they realized they'd accidentally uploaded a copyrighted file.
But they're still always there in the history, effortless to download.
I'm not really sure what to make of that. I don't think it would really count as removal in court... but it seems rare and complex enough that it's not worth bringing up?