Note: this makes sense on CI for a throwaway build, but not for a local dev clone. Blobless clones break or make painfully slow and expensive many local git operations.
Github can also just serve you a tarball of a snapshot, which is faster and smaller than a shallow clone (and therefore it's the preferred option for a lot of source package managers, like nix, homebrew, etc).
It’s frustrating that tarball urls are a proprietary thing and not something that was ever standardized in the git protocol.
Yeah that's what I try to push for if the user (CI, whichever) just wants the files, using "git archive --remote=" is the fastest way to get just the files.
However, a lot of CIs / build processes rely on the SHA of the head as well, although I'm sure that's also cheap / easy to do without cloning the whole repository.
But that falls apart when you want to make a build / release and generate a changelog based on the commits. But, that's not something that happens all that often in the greater scheme of things.
As long as there's some envvars with the SHA, branch name, remote, etc, all that should be handleable by a wrapper (or git itself) being able to fall back on those in instances where it's invoked in a tarball of a repo rather than a real repo.
EDIT: Or alternatively (and probably better), the forges could include a dummy .git directory in the tarball that declares it an "archive"-type clone (vs shallow or full), and the git client would read that and offer the same unshallow/fetch/etc options that are available to a regular shallow clone.
> It’s frustrating that tarball urls are a proprietary thing and not something that was ever standardized in the git protocol.
I think there’s a lot of stuff which is common to the major Git hosters (GitHub, GitLab, etc) - PRs/MRs, issues, status checks, etc - which I wish we had a common interoperable protocol for. Every forge has its own REST API which provides many of the same operations and fields just in an incompatible way. There really should be standardisation in this area but I suppose that isn’t really in the interests of the major incumbents (especially GitHub) since it would reduce the lock-in due to switching costs
Yeah, the motivation question is definitely a tricky one. A common REST story also feels like a piece of eventually getting to federated PRs between forges, though it may well be that that's just impossible, particularly given that GitLab has been thinking about it for a decade and hasn't even got a story for federation between instances of itself much less with Github or Bitbucket:
I have a vague recollection that GitHub is optimized for whole repo cloning and they were asking projects not to do shallow fetching automatically, for performance reasons
> Apparently, most of the initial clones are shallow, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the --depth=1 option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.
GH Actions generally need a throwaway clone. The issue with shallow clones is that subsequent fetches can be expensive. But in CI most of the time you don't need to fetch after clone.
I believe there is a bit of a footgun here because if you don't git clone then you don't fetch all branches, just the default. Can be very confusing and annoying if you know a branch exists on remote but don't have it locally (the first time you hit it, at least).
According to SO, newer versions of git can do,