How much bandwidth and time is wasted cloning the entire history of large projec...

acheong08 · 2025-03-17T00:16:16 1742170576

git clone --depth 1 works as well. If you're just cloning to build and not contributing it makes much more sense

hiccuphippo · 2025-03-17T06:47:22 1742194042

There's also the partial clone which has the tree but not the blobs:

git clone --filter=blob:none

Reommended for developers by github over the shallow clone: https://github.blog/open-source/git/get-up-to-speed-with-par...

jakub_g · 2025-03-17T08:44:22 1742201062

Note: this makes sense on CI for a throwaway build, but not for a local dev clone. Blobless clones break or make painfully slow and expensive many local git operations.

pabs3 · 2025-03-18T10:21:59 1742293319

Also --filter tree:0

mikepurvis · 2025-03-17T00:38:27 1742171907

Github can also just serve you a tarball of a snapshot, which is faster and smaller than a shallow clone (and therefore it's the preferred option for a lot of source package managers, like nix, homebrew, etc).

It’s frustrating that tarball urls are a proprietary thing and not something that was ever standardized in the git protocol.

Cthulhu_ · 2025-03-17T09:00:04 1742202004

Yeah that's what I try to push for if the user (CI, whichever) just wants the files, using "git archive --remote=" is the fastest way to get just the files.

However, a lot of CIs / build processes rely on the SHA of the head as well, although I'm sure that's also cheap / easy to do without cloning the whole repository.

But that falls apart when you want to make a build / release and generate a changelog based on the commits. But, that's not something that happens all that often in the greater scheme of things.

mikepurvis · 2025-03-17T15:08:22 1742224102

As long as there's some envvars with the SHA, branch name, remote, etc, all that should be handleable by a wrapper (or git itself) being able to fall back on those in instances where it's invoked in a tarball of a repo rather than a real repo.

EDIT: Or alternatively (and probably better), the forges could include a dummy .git directory in the tarball that declares it an "archive"-type clone (vs shallow or full), and the git client would read that and offer the same unshallow/fetch/etc options that are available to a regular shallow clone.

skissane · 2025-03-17T02:49:57 1742179797

> It’s frustrating that tarball urls are a proprietary thing and not something that was ever standardized in the git protocol.

I think there’s a lot of stuff which is common to the major Git hosters (GitHub, GitLab, etc) - PRs/MRs, issues, status checks, etc - which I wish we had a common interoperable protocol for. Every forge has its own REST API which provides many of the same operations and fields just in an incompatible way. There really should be standardisation in this area but I suppose that isn’t really in the interests of the major incumbents (especially GitHub) since it would reduce the lock-in due to switching costs

mikepurvis · 2025-03-17T15:11:46 1742224306

Yeah, the motivation question is definitely a tricky one. A common REST story also feels like a piece of eventually getting to federated PRs between forges, though it may well be that that's just impossible, particularly given that GitLab has been thinking about it for a decade and hasn't even got a story for federation between instances of itself much less with Github or Bitbucket:

https://gitlab.com/gitlab-org/gitlab/-/issues/14116

p_wood · 2025-03-18T10:49:51 1742294991

> It’s frustrating that tarball urls are a proprietary thing and not something that was ever standardized in the git protocol.

`git archive --remote` will create a tarball from a remote repository so long as the server has enabled the appropriate config

arkh · 2025-03-17T08:16:11 1742199371

> If you're just cloning to build

... the last commit. If you have to rollback a deployment, you'll want to add some depth to your clone.

jes5199 · 2025-03-17T00:13:59 1742170439

I have a vague recollection that GitHub is optimized for whole repo cloning and they were asking projects not to do shallow fetching automatically, for performance reasons

nyanpasu64 · 2025-03-17T01:21:37 1742174497

As I understand this issue affected Homebrew and CocoaPods: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

> Apparently, most of the initial clones are shallow, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the --depth=1 option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.

sureIy · 2025-03-17T01:21:11 1742174471

I don't know if that applies anymore or if it doesn't apply on GitHub Actions, but shallow clones is the default there. See `actions/checkout`

jakub_g · 2025-03-17T18:01:47 1742234507

GH Actions generally need a throwaway clone. The issue with shallow clones is that subsequent fetches can be expensive. But in CI most of the time you don't need to fetch after clone.

masklinn · 2025-03-17T05:44:16 1742190256

It’s subsequent updates which didn’t work well. And it is (or at least was) a limitation of git itself.

bobbylarrybobby · 2025-03-17T01:00:46 1742173246

I believe there is a bit of a footgun here because if you don't git clone then you don't fetch all branches, just the default. Can be very confusing and annoying if you know a branch exists on remote but don't have it locally (the first time you hit it, at least).