I can't imagine you haven't looked at this, but I'm curious: Do shallow clones help at all, or if not what was the problem with them? I'm willing to believe that there are usecases that actually use 1M commits of history, but I'd be interested to hear what they are.
Maybe I was doing something wrong, but I had a very bad experience with - tbh don't remember, either blobless or treeless clone - when I evaluated it on a huge fast-moving monorepo (150k files, 100s of merges per day).
I cloned the repo, then was doing occasional `git fetch origin main` to keep main fresh - so far so good. At some point I wanted to `git rebase origin/main` a very outdated branch, and this made git want to fetch all the missing objects, serially one by one, which was taking extremely long compared to `git fetch` on a normal repo.
I did not find a way to to convert the repo back to "normal" full checkout and get all missing objects reasonably fast. The only way I observed happening was git enumerating / checking / fetching missing objects one by one, which in case of 1000s of missing objects takes so long that it becomes impractical.
For rebasing `--reapply-cherry-picks` will avoid the annoying fetching you saw. `git backfill` is great for fetching the history of a file before running `git blame` on that file. I'm not sure how much it will help with detecting upstream cherry-picks.
Oh, interesting! Tbh I don't fully understand what "--reapply-cherry-picks" really does, because the docs are very concise and hand-wavy, and _why_ it doesn't need the fetches? Why it is not the default?