So it looks like you're ready for a change ... What's next on the horizon?

Zacharias030 · on July 23, 2023

Generally git‘s support for a stacked PR workflow is poor [0], but imho that is the future of team collab (git is great for very asynchronously built projects, like the linux kernel). I also wonder, how much better git could be if it was based on DAGs not trees (I may want to use a changeset that is still developing in more than one branch without maintaining copies of it) and corollarily I‘d like to rebase subtrees (sub-DAGs) instead of single branches.

I have introduced a stacked PR workflow to our team a while ago and a few months later half of the team had migrated to some kind of stacked PR workflow tool (on top of github and git, even though support is sub-optimal). It seems like this is an idea that is really sticky.

[0] https://github.com/ezyang/ghstack

seba_dos1 · on July 23, 2023

> (I may want to use a changeset that is still developing in more than one branch without maintaining copies of it)

Not sure if you realize, but a commit is a state of all files in the repository, not a patch. Patches are calculated for you at display time (and can be calculated against any other commit, not just a parent). Sounds like you may be confused because of trying to apply a wrong mental model of how the repository represents things.

I'd say that git actually supports stacked workflows quite well. It's GitHub's PR model that makes it hard.

tempay · on July 23, 2023

> I'd say that git actually supports stacked workflows quite well. It's GitHub's PR model that makes it hard.

I agree that the model does but I’m not aware of any good way of using such a workflow with the CLI either. Is there a reasonable way to effectively keep rebasing on top of multiple upstream branches?

pravus · on July 23, 2023

> Is there a reasonable way to effectively keep rebasing on top of multiple upstream branches?

This is the problem. You cannot rebase with a stacked PR flow. The correct way to do this is use the merge command as intended.

One of the most powerful features of git is the ability to understand a common history between multiple parties and everyone throws this away entirely with a rebase and causes non-stop conflicts. I simply do not understand the preference for it, especially in shared repos.

Do not use a rebase workflow with any work that is shared with others unless you are communicating regularly and understand how obliterating your commit history will change how git views what is changed between two repositories. Rebase only works well if you are in a leaf branch you control and even then I prefer a single squash merge back into upstream rather than multiple rebases if possible.

I can't even tell you how much of my life has been lost correcting merge conflicts caused by bad rebasing of my commits by others.

kps · on July 23, 2023

> Not sure if you realize, but a commit is a state of all files in the repository, not a patch.

I think that's a core problem. It's not just that git calculates a patch to show you, it's that — in every git-using project I've seen — a developer writes a patch, and writes a commit message describing that patch. It's not just github. And then developers make the incorrect assumption that git's later presentation of the commit as a patch matches the original patch and is accurately described by the commit message.

seba_dos1 · on July 23, 2023

The developer never writes a patch when using git. The developer creates a new repository state, links it to a parent state (or multiple parent states) and describes the difference between these states in the commit message. A patch form is just a handy way to visualize these changes.

You can dump commits into patches and then apply them onto different repositories, but in order to do that you still have to convert such patch into a new repository state first.

Many people "learn" git by learning which commands to use to do some things and in turn don't understand what's going on at all. It's like learning how to write a letter by reading Word's manual.

eru · on July 23, 2023

> I also wonder, how much better git could be if it was based on DAGs not trees [...]

Git generally supports DAGs.

> [...] corollarily I‘d like to rebase subtrees (sub-DAGs) instead of single branches.

Rebasing is something you do to the commit graph, which is a DAG. Branches only come in incidentally. Branches in git are really just mutable pointer to immutable commits.

What you are describing is probably some useful workflow, I guess?

lloeki · on July 23, 2023

I seem to understand GP wants to move a whole DAG potentially having multiple leaves, not just a DAG ending at a single leaf.

IOW

   git rebase --onto shaX shaY shaZ

ends at shaZ, git walks backwards from it until the commit whose parent is shaY to produce the list of commits to cherry-pick onto shaX

So presumably this would be useful:

    git rebase --onto shaX shaY [shaZ1 shaZ2 shaZ3 ...]

with shaZn being optional and consisting of all leaves down from shaY

This is achievable with git but it's not just doing n rebases like so:

    git rebase --onto shaX shaY shaZ1
    git rebase --onto shaX shaY shaZ2
    git rebase --onto shaX shaY shaZ3
    ...

because each rebase would produce different commits for parts that are common to shaZ n1 and shaZn2 ancestry, so one would have to first find all the branching points and do partial rebases onto the rebased parent commits in order.

It definitely can be done (manually or automatically) but is not as trivial as one might think.

eru · on July 23, 2023

OK, that makes sense.

I think if you wanted to do this, it would probably be easiest to produce an artificial leaf that points to all the leaves you want to rebase.

sanderjd · on July 23, 2023

Yes, and the original commenter's point is that git does not support this well, at the data model layer even. Because commits have exactly one parent commit, which is immutable, and because rebasing creates a new commit, rebasing an entire subtree with N nodes under it requires N operations, rather than just 1.

Personally I think it's a pretty small price to pay for the advantage of the single-immutable-parent model, but I do think it's surprising that there aren't better tools for this workflow. I do this all the time, but manually and painstakingly.

lloeki · on July 23, 2023

> Because commits have exactly one parent commit

This is not true, merge commits are commits with > 1 parents (octopus merges are merely commits with > 2 parents)

But indeed since the parents commits are part of the computation of a commit's sha then by design changing the parents means changing the commit's sha, which is a very nice property to have.

That said, the model has what we said as a design consequence, but it's perfectly workable to have tooling that walks history and finds branching points and whatnot. Git does it all the time with existing porcelain commands, and I've done it in different contexts (e.g to produce diffs between tip and branching point, analysing which files have changed, and take action pertaining only to these file changes and their dependents) but the gist of it is the same.

It's merely a matter of such porcelain commands not being implemented and not being part of upstream git, so everyone either does it manually or invents their own tooling when they're sick of the pain point.

sanderjd · on July 23, 2023

I can't edit anymore, but hopefully people can read my comment with s/commits have exactly one parent/non-merge commits have exactly one parent/ :)

But this did make me realize that the problem with this workflow isn't the "one parent" part at all, it's really just the "immutable parent" part.

But to the main thrust of your comment: Yes, this "porcelain" is what I meant by my surprise that there aren't better tools for this. I think the reason there aren't is that it is really hard to do this well with porcelain, because of requiring a bundle of modifications to the commit tree, where there is no good way to make the whole bundle atomic. So it works fine (and I have a script for it) in the happy path, but it becomes a bit of a nightmare if something doesn't apply quite right.

And that's what I mean by this being unsupported at the data model level, that there is no way to do atomic operations on subtrees as a whole.

eru · on July 24, 2023

> [...] where there is no good way to make the whole bundle atomic.

You could just do all the commit manipulations you need to do, and only update the branches at the very end?

Updating the individual branch 'pointers' ain't atomic, but if there's nothing else going on in the repo at the time, it can't really fail; if you've already created the new commits.

sanderjd · on July 24, 2023

When things like this usually go wrong with an automated tool, it's not that it "fails", it's that it succeeds part of the way, leaving things in an inconsistent state, which is then hard to reconstruct in either direction.

But you're right that it should be possible to build such things in a way that works well and very rarely screws up. But just that I don't really know of any tools that try to do anything more complicated than `rebase -i` seems to suggest that it is not easy to do, or there would be more people doing this.

eru · on July 24, 2023

> When things like this usually go wrong with an automated tool, it's not that it "fails", it's that it succeeds part of the way, leaving things in an inconsistent state, which is then hard to reconstruct in either direction.

I can see that things can go wrong when your are half-way through constructing the new commits. But that's fine: you just leave them as they are, and let git's gc clean them up eventually automatically. As long as you don't touch the user's branches (remember, which are just mutable pointers to immutable commits), the user doesn't need to care that your tool screwed up half-way through.

sanderjd · on July 24, 2023

Yeah I'm with you, this can be done, but again, I'm wracking my brain for an example tool that does this without being frustrating sometimes. Even just rebase itself, which is much simpler than what I'm talking about here, has footguns.

But I think this thread has convinced me that someone could almost certainly make better tools for this stuff, and now I'm just wondering why they haven't.

zrail · on July 23, 2023

You're right that the tooling doesn't support it well but commits have more than one parent all the time, they're just called merge commits.

They can actually have as many parent branches as you please but conflicts get harder to deal with the more parents you have.[1]

[1]: https://www.freblogg.com/git-octopus-merge

sanderjd · on July 23, 2023

I realized the multiple parents thing was a total red herring. It really is just the immutability.

What I was thinking of was having multiple parents over time rather than at the same time like a merge commit. But that would just be one (not immutable) parent but with a history of what that parent was over time, not "multiple parents".

seba_dos1 · on July 23, 2023

Not only it's useful, it's as easy to do in git as typing `git rebase -r`. Recently it even gained support for rewriting branch pointers in the process.

eru · on July 23, 2023

In general, it's almost always better to use long options. So that would be `git rebase --rebase-merges` in this case.

Almost always means: it's better eg when communicating with other humans, whether that's on a forum like HN or in code or scripts. Long options are easier for humans to understand and to 'google'. They also provide some redundancy against typos.

The sole exception, where short options can be useful, is when you are actually using a command line interactively. Use short options to your heart's content there.

marcandre · on July 23, 2023

Do you mean it can rewrite branch pointers that pointed to intermediate commits? How?

seba_dos1 · on July 23, 2023

--update-refs