Git Koans (2013)

patrec · on Dec 23, 2020

Sadly the inconsistent and baroque command line interface of git for basic stuff is the least of its problems -- learning it is just a one time sunk cost every developer has to pay these days.

What's more troubling is that git can't do some very basic stuff well, like merging branches. I mean you can do the merge, but good luck trying to revert it[+] or trying to get git to deal intelligently with non-linear history in general.

Another fun thing is that every git repo starts its life with an off-by-one error: there is no canonical (and identical) first "root" commit, a NIL commit if you will. This is unfortunate for two reasons:

1. A lot of common scripting idioms will break on the first actual commit (e.g. finding changed files via git diff --name-only A A^ will not work if A is the first commit; yes it's possible to work around that, just like you can write shell scripts that deal with filenames with spaces). Also, it would be convenient in many cases to have an canonical initial empty NIL commit as symbolic reference, similar to how having HEAD is handy.

2. More subtle: the fact that there is no common shared root commit between arbitrary repos makes a couple of things more involved and confusing than they would otherwise be, for example rewriting several different repos into one after the fact.

In theory it would be possible to work around that by everyone agreeing on a canonical first commit to start each repo with, but in practice that's of course unlikely to happen.

Lastly, and not entirely unrelated: many, but no good solutions for sub-repositories.

[+] https://github.com/git/git/blob/master/Documentation/howto/r...

scrollaway · on Dec 23, 2020

I always initialize my repos with git commit --allow-empty -m "Initial commit", for that exact reason. I agree with you that git should provide a root commit on init.

patrec · on Dec 23, 2020

See my reply to sibling on how to do it in a reproducible fashion. Maybe we should just all agree on a canoncial initial commit after all? :)

scrollaway · on Dec 23, 2020

I don't think there's a real point to do this "in a standardized way"; imo we should just push for git initializing its own empty initial commit.

patrec · on Dec 23, 2020

I agree git init should do it, but a) it still needs to be the same commit every time to work properly b) a particular "canonical" standard initial commit becoming a defacto standard in the wild presumably increases the chances that this makes it into git itself.

lmm · on Dec 23, 2020

> What's more troubling is that git can't do some very basic stuff well, like merging branches. I mean you can do the merge, but good luck trying to revert it[+] or trying to get git to deal intelligently with non-linear history in general.

IME Git handles non-linear history pretty well. AFAICT any DVCS will inherently have the same problem; you can't necessarily memory-hole the fact that the merge has happened because other people may already have your merge. What are you claiming is the "right" way to handle that case?

> In theory it would be possible to work around that by everyone agreeing on a canonical first commit to start each repo with, but in practice that's of course unlikely to happen.

I believe there is a known commit hash that's there in the datamodel. So it would be possible to treat this as NIL with just changes an the UI level. I agree it's a deficiency, though I don't think it's the most important problem with git (the huge inconsistencies around the staging area are much more important IMO).

> Lastly, and not entirely unrelated: many, but no good solutions for sub-repositories.

I don't think a good solution is possible, personally. The repository is the unit of history, branching, tagging and so on, and that makes for a nice model that I don't want to change. If you have multiple repositories, it's better to deal with them as such.

user-the-name · on Dec 23, 2020

Mercurial subrepos use basically the exact same model as git submodules, but unlike git submodules, they actually work and are useful.

The differences aren't huge, either. It would not take much effort to implement the final 20% of functionality to make git submodules useful. But it's just been left as a half-finished, half-broken feature that is not good for anything.

patrec · on Dec 23, 2020

This matches my (very hazy) memory ; I'm almost wondering at this point whether the way of least friction to herd a bunch of git repositories I have is to stuff them all into a mercurial super-repository (from what I remember, mercurial subrepos can be svn or git repos as well). Are you aware of a good up-to-date write-up of the differences of git submodules and mercurial subrepos by any chance?

Whilst I can live with mercurial having lost the VC war (and prefer git on the whole) I am a bit annoyed that few of the good features that hg has over git seem to have been adopted by the latter, even where conceptually compatible (I believe git bundle is hg inspired, but it's pretty fringe to start with).

patrec · on Dec 23, 2020

The issue is not one of memory holing: non-merge commit reverts work as people expect (and leave a historical record). I think the right way handle this in git, in most cases, is to just avoid merging altogether. Do work on feature branches and rebase those into master (possibly adding some meta-info like Feature: 123 to the individual commits as you do the rebase). Apart from not having to explain the semantics of merge reverts to everyone in your team this saves a lot of other conceptual overhead as well.

I disagree that git deals well with non linear history, everything about dealing with non-linear history in git is pretty painful, from bisecting to querying (compare to revsets in mercurial, for example) or logging.

Multiple independent repositories across a single org (working on one product) don't work well, in my experience. Unless your org is large enough to hit scalability problems monorepo seems the way to go with git -- it makes it easy for everyone to find stuff and use a consistent shared "timeline". However, other than submodules and co in git being a tire fire, I see no reason why the same purpose could not be served by subrepos in many cases; what makes you think they are inherently problematic?

lmm · on Dec 23, 2020

> The issue is not one of memory holing: non-merge commit reverts work as people expect (and leave a historical record). I think the right way handle this in git, in most cases, is to just avoid merging altogether. Do work on feature branches and rebase those into master (possibly adding some meta-info like Feature: 123 to the individual commits as you do the rebase). Apart from not having to explain the semantics of merge reverts to everyone in your team this saves a lot of other conceptual overhead as well.

That's a non-answer and sacrifices most of the benefits of using a DVCS at all; if you're going to do that you might as well just use SVN. What are the semantics you would want/expect reverting a merge to have? As far as I'm concerned, other than having to pointlessly pass "-m 1" every time, reverting a merge does exactly what I'd expect it to.

> I disagree that git deals well with non linear history, everything about dealing with non-linear history in git is pretty painful, from bisecting to querying (compare to revsets in mercurial, for example) or logging.

Bisect works fine. Could you be more specific?

> Unless your org is large enough to hit scalability problems monorepo seems the way to go with git -- it makes it easy for everyone to find stuff and use a consistent shared "timeline".

IMO the repo is the unit of versioning; things that are released together should go in the same repo, while things with separate lifecycles should have their own repositories. That way your tags and history work the way you'd expect, but you avoid showing a misleading global timeline if that doesn't actually exist (e.g. if a project depends on a previous release of an internal library, it's misleading to have that internal library in the same repository as that project, because you'd see the "current" version rather than the code you were actually using).

> However, other than submodules and co in git being a tire fire, I see no reason why the same purpose could not be served by subrepos in many cases; what makes you think they are inherently problematic?

I see them as inherently problematic because they hugely complicate the model. I don't want to think about different parts of the checkout being on different revisions, branches, or tags.

cowsandmilk · on Dec 23, 2020

Merges are definitely not the point of dvcs...

To avoid writing a treatise, I’ll just point to the advantages of distributed over centralized on Wikipedia, which is a list of stuff completely unrelated to merges.

https://en.m.wikipedia.org/wiki/Distributed_version_control

lmm · on Dec 24, 2020

"Allows various development models to be used" and "much easier to create a project fork" very much depend on merges. If you're rebasing you're necessarily following a model with a single central branch, so your model would work in a centralised system as well.

patrec · on Dec 23, 2020

> you might as well just use SVN

I've got a feeling you are not being fully serious, so on that assumption instead of me explaining why this really not at all the case, how about you provide an example of a workflow that you think crucially depends on merging rather than rebasing, and we can discuss that?

> Bisect works fine. Could you be more specific?

Sure: given a test command, show me a simple git bisect invocation that finds the merge commit that broke master.

> things that are released together should go in the same repo, while things with separate lifecycles should have their own repositories.

I think this is a useful criterion, but one that tends to be only clear-cut for things like shrinkwrapped software (and it's not the only thing that matters). If you run a service of any complexity and with any sort of uptime requirements, you will not ship everything together, even if its part of a single feature and often you will have different versions of the same service in production in parallel as well. Al

> if a project depends on a previous release of an internal library, it's misleading to have that internal library in the same repository as that project, because you'd see the "current" version rather than the code you were actually using

A strange objection. Surely the point of having a super-repo would be that the subrepos at any one commit in the super repo would form a consistent state of the world, rather than you pinning inconsistent versions of different repos in the same commit of your super-repo?

> I don't want to think about different parts of the checkout being on different revisions, branches, or tags

At the most basic level you could think about the sup-repos like a pinned (yarn, poetry, budler, ...) dependencies and the super-repo as a lockfile with extra benefits (such as 'git diff HEAD^' presumably showing you all the source changes in the sub repos since the last time you committed their versions in the super-repo). If you are not directly working on code in any of the sub repos, you simply don't have to care about their different revisions, branches or tags. If you do want to make a change in one of the subrepos with a view to landing this change in the super-repo as well, it is true that you will probably create a new branch in first the subrepo, have that merged and then create a branch in the super-repo that switches it to the new version. But presumably the reason you don't just have a mono-repo to start with is because there is some degree of indepdendence (such as your proprietary app depending on an open source lib you also maintain). One realistic example from the open source world might be something like OpenBSD which is (if I remember right, apologies if I got something wrong) developed in one big (csv?) mono repo and that also includes all the bits that have made it into the wider unix eco system like OpenSSH or libreSSL. The cross platform versions of these are developed out of stream by people periodically pulling the whole OpenBSD repo as canonical upstream, copying the relevant subset of stuff across, updating their compatibility layers and pushing it to separate repos that contain just OpenSSH (or whatever). Presumably depending on goals and priorities, having OpenSSH be a sub-repo instead would offer benefits to at least some of the people involved (assuming a well designed sub-repo mechanism).

user-the-name · on Dec 23, 2020

> I've got a feeling you are not being fully serious, so on that assumption instead of me explaining why this really not at all the case, how about you provide an example of a workflow that you think crucially depends on merging rather than rebasing, and we can discuss that?

Well, any workflow that wants to actually keep history, and that doesn't want to be littered with strange, possibly broken commits?

patrec · on Dec 23, 2020

If you are genuinely worried that rebasing a feature branch into master will somehow corrupt the precious bodily fluids of your commits, reading a book on git will probably be the most immediately effective means to avoid littering everything with strange, possibly broken commits.

We all want to actually keep history, but unless you have set your IDE up to do a commit on every keypress there probably is also some subset of history you do not in fact wish to keep. So the question becomes what history should one preserve, and here opinions can legitimately differ. What I would suggest, at minimum, is not allowing force pushes to master or any other branches that you consider as "published" (internally or externally) -- easy to enforce on basically any git hosting solution or via a server hook. However this is a completely orthogonal question to choosing a rebase or merge based workflow. If master moved from my local branch, and I do a merge of a feature branch and force push, upstream history will get lost (baring a copy in the reflog or elsewhere).

user-the-name · on Dec 23, 2020

What I believe is that rebase creates entirely new and untested code states, that have zero guarantees of working or building, and that can in fact quite easily be entirely invalid.

patrec · on Dec 23, 2020

You may believe so, but it's not true. Let me explain. There are two somewhat different use cases for rebasing:

1. You want to do the equivalent of a merge without creating a merge commit, which is what I was mostly talking about when I was contrasting a merge based with a rebase based workflow. There is precisely zero difference in guarantees of things working or building or whatever between you rebasing or merging a branch into master. In fact, funnily enough, if you actually do testing/CI correctly, issuing a "git merge feature-branch" command to land things in master will, with high likelihood do a rebase rather than creating a merge commit. This is because you will have pre-merged (or equivalently, as far as testing the right thins is concerned, rebased) master into your feature branch first. Because that's the only way to actually test the right thing (the state of the code as it will be once it lands it master).

2. You can use rebase to (potentially wildly) rewrite individual commits (splitting them up, re-arranging or melding or editing them etc.) on your feature branch. This is not what I'm talking about, but, I have the feeling, the thing you are actually objecting to. And yes, you can indeed mess up previously working commits this way. If you feel uncomfortable doing this, just don't -- you can still use (and profit from) a purely rebase based workflow.[+]

[+] As an aside: there are ways of doing this type of rewriting that give you guarantees of working and building -- e.g. just run git rebase --exec "make clean && make test" ... (each individual commit in the rebase is tested before landing and the rebase stops on anything that errors out)[++]. But as I mentioned, you can still get all the benefits of a rebase-based workflow without ever engaging in this type of rebasing, and similarly you can have a fully merge-based workflow (for landing things in master) and people might still wildly rebase-to-rewrite their commits on their feature branches (and if they are not even pushed, who's gonna stop them?). These things are really orthogonal. Also, whilst I agree that undisciplined rewriting of commits is a bad idea, and can mess up previously working non-HEAD commits, it should never mess up the final commit on a feature branch undetected. Because you're running CI, right?. And at the end of the day almost no workflow will run full tests/CI on every commit (too expensive for anything but small projects), so broken "intermediate" commits can even happen without rebasing and if you do in fact run CI on every commit, well even undisciplined rebasing is perfectly safe.

[++] Also: I'd recommend preferring git commit --fixup to wildy messing about with git rebase -i on the fly.

lmm · on Dec 24, 2020

> There is precisely zero difference in guarantees of things working or building or whatever between you rebasing or merging a branch into master. In fact, funnily enough, if you actually do testing/CI correctly, issuing a "git merge feature-branch" command to land things in master will, with high likelihood do a rebase rather than creating a merge commit. This is because you will have pre-merged (or equivalently, as far as testing the right thins is concerned, rebased) master into your feature branch first. Because that's the only way to actually test the right thing (the state of the code as it will be once it lands it master).

The head commit on master is the same, but the state of previous commits from your branch becomes different and untested, and quite possibly broken. And presumably you valued that commit history on your branch, or why keep them at all? If you want master's history to only reflect the sequence of features landing master, squash-merging is a much better way to achieve that than rebasing - it doesn't create a fictional commit history, just the mega-commits you wanted. Future bisects will land you on a mega-commit, but at least you don't have to handle a bunch of non-compiling states on the way there.

patrec · on Dec 24, 2020

> The head commit on master is the same, but the state of previous commits from your branch becomes different and untested, and quite possibly broken.

Try to break it down, and you will see that the scenario you're positing doesn't make much sense and your concerns are unfounded:

Let's posit the following:

1. We have a feature branch B, with carefully crafted and individually tested commits.

2. We have a minimally working CI setup (which means we always test the code state of the new head of master before landing stuff in master "for real").

On that assumption one of two things can happen when you try to land B whether via rebase or merge:

a) CI fails, because changes in master in the meantime make it so your branch now triggers a test failure or merge conflict whereas before it didn't. We can ignore this, scenario, because nothing lands till

b) CI passes. So we know the HEAD commit on master is fine. But, you say, whilst I know the individual commits on B are also good (I tested them all) and the merge commit is good (CI!), with rebase I only know the head commit works -- what if the rebasing broke some intermediate commits?

But for that to happen, all the following need to be true:

1. You don't didn't use rebase -x (why not, if you carefully test all individual commits on the feature branch?)

2. Your individual commits all work correctly on the feature branch and would pass CI.

3. The sum of your individual commits plus master works correctly and definitely does pass CI.

4. Because of changes in master branch since the common ancestor of present master and B, a) at least one of the commits of B no longer would pass CI when tested, although it does on the branch (not an unlikely scenario so far) AND b) a latter commit on the feature branch miraculously fixes this incompatibility (because we know the HEAD passes). This is not theoretically impossible, but exceedingly unlikely. And, you can easily guard against it happening as described above.

lmm · on Dec 25, 2020

> a) CI fails, because changes in master in the meantime make it so your branch now triggers a test failure or merge conflict whereas before it didn't. We can ignore this, scenario, because nothing lands till

No we can't, because what do people do in this scenario? They make a new commit that fixes the tests, but leaves a long chain of previous commits broken.

(Also in a big enough codebase with enough collaborators it becomes impractical to require CI to pass against the absolute tip of your shared development branch before merging, because in the time it takes to run CI there will always have been more changes landing in that shared branch).

> 1. You don't didn't use rebase -x (why not, if you carefully test all individual commits on the feature branch?)

I've never known anyone to actually use -x. Maybe a very dedicated individual could, but there's no way you'd be able to enforce it across a team (unless that team is in the habit of making feature branches that are just a handful of big commits - but in that case you might as well just squash-merge). Pulling changes from master and resolving any conflicts is already a frustrating interruption; making it take 20x longer is a non-starter.

People don't carefully test all individual commits, but that's ok. Even the most casual testing will catch when a commit simply doesn't compile, and introducing a compilation error in manual code changes is much less common than introducing a compilation error when pulling changes from master. And in the worst case, an isolated non-compiling commit in history is not too much of a problem to skip when bisecting. The problem only comes when you have a long chain of non-compiling commits - which is exactly what rebasing tends to produce.

dragonwriter · on Dec 25, 2020

> Maybe a very dedicated individual could, but there's no way you'd be able to enforce it across a team

Adopt standard workflow tooling that enforces it, like the gitflow tool but for your actual team workflow.

lmm · on Dec 25, 2020

This is vaporware; meanwhile a merge-based workflow is working fine in many organisations every day.

patrec · on Dec 26, 2020

> No we can't, because what do people do in this scenario? They make a new commit that fixes the tests, but leaves a long chain of previous commits broken.

Same thing will happen with merges -- one can try to argue (as you have done) that with merges it will be less bad, because you may on average end up with fewer broken commits -- and I don't think that is implausible. But that is basically just saying your existing way to bisect is not fully reliable and you fear using rebase would make it quantitatively noticeably worse. What I am saying is it can and should be fully reliable, and this is not theoretical either.

I enjoy discussing different workflows with people with a different outlook and experience and who have put a decent amount of thought into it, as you clearly have. But I still have to note that there is a certain irony here: you have something which by your own admission is not completely reliable; i.e. not all your merges in master have passed CI and you will have some (hopefully small) proportion of commits that you'll need to manually bisect skip because they are bad for one reason or other and you won't know in advance. You are adamant that rebase-based workflows are bad (partly) because you fear that they will greatly exacerbate such problems and you don't seem to think it's practically possible to completely avoid them to start with.

But I am arguing from plenty of real-life experience with a workflow, which is incidentally rebase-based (but the same guarantees would hold just as well if I moved it to a purely merge based approach) where I a) know that every "merge" to master has been fully tested before it lands b) can reliably bisect over ~years of commits with a simple git alias, without having to ever manually bisect skip bad commits[+]. And by construction, not because I hope that devs will generally have tested their commits sufficiently manually or are diligent about fixing merge conflicts in a way that does not result in intermittent broken commits. So whilst several of your other reasons are perfectly valid (long-lived branches), you can maybe see why this one is a bit amusing to me.

> (Also in a big enough codebase with enough collaborators it becomes impractical to require CI to pass against the absolute tip of your shared development branch before merging, because in the time it takes to run CI there will always have been more changes landing in that shared branch).

It's not impractical at all, in fact if your company pays me for it I'm more than happy to set it up for you :)

Alternatively shoot me an email and I'll be happy to explain the gist of it (including how to completely avoid the problem you mention above).

[+] Of course assuming nothing was broken at the meta level, e.g. no one accidentally temporarily misconfigured CI, so some stuff that should have been didn't get tested.

lmm · on Dec 26, 2020

> But that is basically just saying your existing way to bisect is not fully reliable and you fear using rebase would make it quantitatively noticeably worse. What I am saying is it can and should be fully reliable, and this is not theoretical either.

What you're proposing doesn't actually improve matters; you don't gain anything (other than saving a bit of machine time) by only testing known-compiling commits. At the end of the day the bisect lands you on either a single commit or a string of commits (in the case where some commits don't compile), and that's the diff that you have to go through manually; the whole point of bisect is to make that diff as small as possible. If you limit your bisect to testing changes from the history of master (one way or another), you guarantee that you'll land a diff that's a complete feature branch; if you allow the bisect to go through every commits then you have a decent chance of landing on a much smaller diff.

> But I am arguing from plenty of real-life experience with a workflow, which is incidentally rebase-based (but the same guarantees would hold just as well if I moved it to a purely merge based approach) where I a) know that every "merge" to master has been fully tested before it lands b) can reliably bisect over ~years of commits with a simple git alias, without having to ever manually bisect skip bad commits[+]. And by construction, not because I hope that devs will generally have tested their commits sufficiently manually or are diligent about fixing merge conflicts in a way that does not result in intermittent broken commits. So whilst several of your other reasons are perfectly valid (long-lived branches), you can maybe see why this one is a bit amusing to me.

I've got plenty of real-life experience with plenty of different workflows, thank you very much. If you decide you want a history-of-master history, there are plenty of ways to get that (I'd argue squash-merging is the lowest-overhead way to do it). But the result of that is you get a much less useful bisect than a workflow where you have small commits on feature branches and use merges.

> It's not impractical at all, in fact if your company pays me for it I'm more than happy to set it up for you :)

I don't work for that company any more, but bear in mind this was for a codebase with 500 developers where builds took around 1.5 hours. We looked at building speculative batches of PRs together but decided the costs were higher than the benefits; master occasionally got broken and we fixed it when it happened.

Fundamentally a broken master is always a problem you can have, because flaky tests happen (and, as you mention, meta-level problems can happen). It's good to minimise the times when master is broken, but it's not realistic to assume you can avoid it entirely, so your workflow should be able to handle having the occasional isolated broken commit in the history of master.

patrec · on Dec 27, 2020

> What you're proposing doesn't actually improve matters; you don't gain anything (other than saving a bit of machine time) by only testing known-compiling commits.

You're not being quite honest here: of course just knowing if something compiles at all, whilst nice, is not that massive a difference (because you can fairly easily script an automated bisect that will skip non-compiling commits by itself). But that's much weaker than what I actually said: knowing that the commit has passed your (1.5 h) CI, and making it so that all your merge commits on master will have this property. Since you apparently haven't tried it, maybe you should be a bit more careful about dismissing the utility out of hand? In the absence of an automatic way to avoid bad commits, I have in the past given up on using bisect to track something down on multiple instances because it was just too much overhead to work out what commits needed to be skipped; if you hit a commit that's broken in a more subtle fashion than "does not compile" but that CI would have found it's not always cheap to work out if you hit the regression or some unrelated temporary breakage. And of course there are plenty of benefits unrelated to bisecting.

You are also making a several implicit assumptions which are unlikely to hold: it is not true that typically none of the utility of bisect manifests before you have identified the precise lines of code that caused an issue and understood them. If you have some acute problem in production, being able to reliably and quickly locate the feature branch that introduced it is very valuable (for getting the right people to look at the problem or even shipping some work around before the problem is fully diagnosed). Often certainly a recent deployment is to blame. But sometimes a deployment can exercise something that regressed much earlier and trigger a data corruption that is not noticed immediately. Also, of course once you found some problematic merge there is no reason you wouldn't then continue to use bisect to find the precise commit in the feature branch! Since the cost of dealing with potentially bad commits inside the relevant (ex-)feature branch is much smaller than the cost of dealing with potentially bad commits everywhere on the tree before you have found the right branch (for several obvious reasons), that's still a big win. And yeah it also beats just squash committing and dealing with a single monster diff.

> I've got plenty of real-life experience with plenty of different workflows, thank you very much.

I'm not denigrating your experience. What I'm stating that I have good reasons to be sceptical of part of your rationale for disliking rebasing ("it will mess up bisecting") because whatever other things you might have done better and more ergonomically in your previous development workflows than I have in mine, I get the pretty clear impression I have experienced some affordances around bisecting in particular that I value which you haven't.

> I don't work for that company any more, but bear in mind this was for a codebase with 500 developers where builds took around 1.5 hours.

I don't think this is beyond what's doable, but you can tell me if you attempted/considered and rejected what I propose below (and if so why it was not workable). You basically need two things: The most vital one being 1. a merge queue that's processed in order (so CI doesn't race against a moving master as you described). This you will need even with a dozen developers and 10 minute CI runs. 2. at that scale, speculative batching as you mentioned. It'll probably need to be reasonably intelligent as well so you can always merge a sizeable batch every 1.5h with high likelihood, even if you assume that (say) on average 2% of your open PRs will break on merge into master/being combined into a batch. For example, run several alternative batches in parallel, run tests in an intelligent and code change dependent manner to maximize the chances of early failure detection and so on and so forth. Assuming each developer lands something on master every two days on average, you'd have to deal with 250 merges a day. Say you can run about 5 sequential CI runs per work day, you'd need to test pretty large batches of ~50PRs per batch, which is more than I have experience with. Since you will only batch stuff that passed branch CI, and disregarding flakes for the moment, a batch should only fail because of an incompatibility of PRs within the same batch or against some very recent addition to master. So you can probably get the failure rate into the low single digit percentages, at which you'd need to do run enough alternative batches in parallel to cope with one or two bad PRs per batch. That still seems feasible if you put batches together intelligently, although of course things get exponentially worse as failure rate increases (already over 1k ways to omit two PRs from a batch of 50; whereas running 50 variations of a batch with 1 PR omitted in parallel is still cost-neutral compared to not batching).

> Fundamentally a broken master is always a problem you can have, because flaky tests happen (and, as you mention, meta-level problems can happen). It's good to minimise the times when master is broken, but it's not realistic to assume you can avoid it entirely, so your workflow should be able to handle having the occasional isolated broken commit in the history of master.

Sure. But minimizing those bad commits (and probably even marking them after the discovery, e.g. via git-notes) pays off.

lmm · on Dec 27, 2020

> You're not being quite honest here: of course just knowing if something compiles at all, whilst nice, is not that massive a difference (because you can fairly easily script an automated bisect that will skip non-compiling commits by itself). But that's much weaker than what I actually said: knowing that the commit has passed your (1.5 h) CI, and making it so that all your merge commits on master will have this property.

In the (somewhat exceptional) case I'm talking about it was 1.5h for a standard build; IIRC straight compilation was more than half of it. In any case, there's no real difference between compilation and "CI" here; you make your bisect script run whatever your CI test is, and skip if it fails, before running the part you're actually testing.

> Since you apparently haven't tried it, maybe you should be a bit more careful about dismissing the utility out of hand?

I've tried workflows that made it easy to do a bisect that tests only commits from the history of master; unless you can explain how what you're suggesting achieves something better than that, I don't think not having used your precise script invalidates my views.

In fact it seems to me that your workflow is distinctly worsened by using rebase; if you used merges then some intermediate commits from feature branches would have been successfully built by CI (pre-merge builds, builds of "early review" PRs that were reworked before merging to master, builds that the developer deliberately ran on CI for whatever reason), whereas by using rebase you guarantee that only the post-merge states of master are available to you for bisection.

> if you hit a commit that's broken in a more subtle fashion than "does not compile" but that CI would have found it's not always cheap to work out if you hit the regression or some unrelated temporary breakage.

What are you doing on CI that's so different to what you're doing during local development? If breakages don't show up until you make the PR that you want to merge that's bad for everyone; running the unit tests that pertain to the code you're working on, if not the whole suite, before committing is just common sense. Of course it's possible for something to work locally and break on CI, but that's a very rare case (much rarer than generally-flaky tests, IME).

> You are also making a several implicit assumptions which are unlikely to hold: it is not true that typically none of the utility of bisect manifests before you have identified the precise lines of code that caused an issue and understood them. If you have some acute problem in production, being able to reliably and quickly locate the feature branch that introduced it is very valuable (for getting the right people to look at the problem or even shipping some work around before the problem is fully diagnosed). Often certainly a recent deployment is to blame. But sometimes a deployment can exercise something that regressed much earlier and trigger a data corruption that is not noticed immediately.

Narrowing it down to a branch really isn't much quicker than narrowing it down to a commit - you've already come up with the test case/script, so it's just a case of letting it run for maybe 5 more cases (if we assume maybe 30 commits on the feature branch). If you have a very small team then I guess narrowing it down to a specific branch might be much quicker than narrowing it down to a commit within that branch - but in that case the bisect is going to find the same thing, you can see when it's got to the stage of testing commits from the same branch and start investigating there. And if you really want to bisect just via the history of master, you can always do that (admittedly with a little scripting, but you don't seem to be shy of that).

> Also, of course once you found some problematic merge there is no reason you wouldn't then continue to use bisect to find the precise commit in the feature branch!

But you can't do that if you've rebased the branch, because most of the branch history is (often) broken. If you were squash-merging you could dig out the "original" version of the branch (assuming it's not been gced) and bisect there (assuming the problem is solely due to a change on that branch and not an interaction between that branch and a concurrent change on master), but if you're rebasing you can't even do that, because if developers are in the habit of rebasing then the "original" branch was probably rebased and force-pushed as well, so is likely to have old commits that don't compile.

> Assuming each developer lands something on master every two days on average, you'd have to deal with 250 merges a day. Say you can run about 5 sequential CI runs per work day, you'd need to test pretty large batches of ~50PRs per batch, which is more than I have experience with. Since you will only batch stuff that passed branch CI, and disregarding flakes for the moment, a batch should only fail because of an incompatibility of PRs within the same batch or against some very recent addition to master. So you can probably get the failure rate into the low single digit percentages, at which you'd need to do run enough alternative batches in parallel to cope with one or two bad PRs per batch. That still seems feasible if you put batches together intelligently

Yeah, that kind of approach definitely seems viable - we had a design like that sketched out, and may even have prototyped it at one point. It was decided that it wasn't worthwhile, because we would still have needed our process for catching when flaky tests were introduced, and that process caught conflicts like this "for free". So you are probably right that it is achievable to avoid ever merging non-compiling code to master except when meta problems happen, but I don't think that actually takes you significantly closer to "master is never broken".

> Sure. But minimizing those bad commits (and probably even marking them after the discovery, e.g. via git-notes) pays off.

Yes and no; as long as they're isolated, I don't think pushing the rate of bad commits from 1% down to 0.5% or even 0.1% is really a game-changer. Obviously there's a threshold where you have enough bad commits that it significantly disrupts being able to bisect at all, but unless you can manage to completely eliminate the possibility of a broken commit, there's a wide spread of bad-commit rates where your workflow has to be pretty much the same. Just like how buying higher quality hardware to reduce defect rates is generally not worthwhile unless you go all the way to buying a mainframe with fully redundant everything and guaranteed uptime; you still need to be fault-tolerant if you buy the pro component with 99% uptime, so you might as well buy the consumer version with 98% uptime if it's even 5% cheaper.

patrec · on Dec 31, 2020

> Yes and no; as long as they're isolated, I don't think pushing the rate of bad commits from 1% down to 0.5% or even 0.1% is really a game-changer.

But how are they going to be isolated if your CI runs for 1.5h and you allow bad commits to master? Unless your CI magically stops very early on a logical merge conflict that does not manifest as patch-level merge conflict, you will make bad merges to master for the 1.5h it will take you to figure out master is broken. Based on the above numbers (250 merges a day) that would be 50 bad merges, assuming ~4 commits per merge you get 1000 broken commits on master on a single broken build; and I can't see how this is not going to hurt your ability to bisect in practice.

There are plenty of nice things that become practical and easy if you combine very low rate of bad merges and if you can get previously compiled artifacts for every commits in seconds to sub-seconds (easy to do with nix) and have an optimized spin-up. E.g. you can very quickly find performance regressions via bisection that happened months ago (useful if you have a lot of seasonal variation in performance profiles (as in search or lots of retail) a perf regression might go unnoticed for considerable time).

> Narrowing it down to a branch really isn't much quicker than narrowing it down to a commit - you've already come up with the test case/script, so it's just a case of letting it run for maybe 5 more cases

I'm failing to get my point across. It's not just 5 more cases, one is reliable the other isn't. I can find the branch completely reliably. And even if I have to deal with potentially broken commits in the branch at that point, this is much less hassle then doing so before I have found the right branch, because a) they are going to be far fewer, so I have a lower chance of hitting one than when bisecting of 10k commits or whatever b) I have some meaningful context. I know I'm looking at feature X and I can probably pretty quickly figure out if some failure inside is the thing I'm looking for or some spurious bad commit. I.e. once I've narrowed it down it's much cheaper to figure out then if I stopped somewhere random in those 10k commits.

> But you can't do that if you've rebased the branch, because most of the branch history is (often) broken.

Well, except it isn't :) Why would it be? Since I avoid long lived branches and tend to coordinate with co-workers when modifying a particular part of the code base, it's not like I would have to deal with merge conflicts all the time, and even in cases where I am, I'd generally try to fix the right commit in the rebase, rather than introducing a chain of broken commits. Seriously, there are plenty of reasons commits on a feature branch will be broken some of the time (e.g. you do a larger change that involves some intermediate step were things are broken) and I don't think rebase materially affects the total adversely in my typical workflows (it certainly also reduces bad commits, because if I notice something is broken on my branch I'll try to fix the relevant commit, which would not be allowed in your workflow).

> What are you doing on CI that's so different to what you're doing during local development?

I try to maximize my development speed with local testing. If I can catch 90% of errors for 5% of test time or less, I'll do that ever time and rely on CI to inform me about more subtle errors (e.g. either integration test failures or higher numbers of samples for property based tests). Why would I not farm out long-running tests to a dedicated CI box and continue working in the meantime?

lmm · on Jan 6, 2021

> assuming ~4 commits per merge you get 1000 broken commits on master on a single broken build

No, because none of the commits from the feature branches are broken - only the merge commits themselves are broken. Since a merge commit by definition has multiple parents i.e. multiple paths through it, they're the least damaging place for breaks to occur; when you hit a broken merge commit it's usually easy for the bisect to find a different path around.

> I'm failing to get my point across. It's not just 5 more cases, one is reliable the other isn't. I can find the branch completely reliably. And even if I have to deal with potentially broken commits in the branch at that point, this is much less hassle then doing so before I have found the right branch, because a) they are going to be far fewer, so I have a lower chance of hitting one than when bisecting of 10k commits or whatever b) I have some meaningful context. I know I'm looking at feature X and I can probably pretty quickly figure out if some failure inside is the thing I'm looking for or some spurious bad commit. I.e. once I've narrowed it down it's much cheaper to figure out then if I stopped somewhere random in those 10k commits.

Narrowing it down within the branch by thinking through it in your head is rarely going to be quicker than just letting the automated bisect run for a few more cases, and never going to be less effort. At that stage you've already done all the work. Let the computer do what it's good at.

> Well, except it isn't :) Why would it be? Since I avoid long lived branches and tend to coordinate with co-workers when modifying a particular part of the code base, it's not like I would have to deal with merge conflicts all the time, and even in cases where I am, I'd generally try to fix the right commit in the rebase, rather than introducing a chain of broken commits.

If you get a git-level conflict, sure. The bigger problems are the semantic conflicts that don't show up as git conflicts e.g. something that was used in your changes gets renamed - I've never seen anyone who goes back and fixes that properly after a rebase. In a large codebase with a lot of developers who believe in aggressive refactoring (which I think is 100% worthwhile, though I suppose that's a separate discussion) this will happen a lot.

> I try to maximize my development speed with local testing. If I can catch 90% of errors for 5% of test time or less, I'll do that ever time and rely on CI to inform me about more subtle errors (e.g. either integration test failures or higher numbers of samples for property based tests). Why would I not farm out long-running tests to a dedicated CI box and continue working in the meantime?

Longer-running regression tests, sure, but those tests don't block bisection either. To me "good enough to continue doing development work on" and "good enough to bisect with" are the same standard; if it's completely broken (e.g. non-compiling) then I can't get anywhere, if there's a slight flakiness that shows up in some occasional edge cases then it doesn't really matter. I don't understand being happy to continue work during regular development, but throwing out anything that's not 1000% reliable when you're bisecting.

lmm · on Dec 24, 2020

> I've got a feeling you are not being fully serious, so on that assumption instead of me explaining why this really not at all the case, how about you provide an example of a workflow that you think crucially depends on merging rather than rebasing, and we can discuss that?

I'm completely serious. Fundamentally any workflow where you have multiple long-lived branches and want to cross-merge between them (e.g. branches for several mostly-independent long-term features, per-client branches) is impossible with rebase; rebase necessarily only works if you have a single central master branch, at which point you're doing something more or less equivalent to SVN (I didn't mean "you might as well just use SVN" as flippantly as it sounds; I genuinely do think SVN is a reasonable choice if your use pattern fits within its limitations and you place a very strong value on the simpler history model). More specifically if you can ever have a scenario where two different people resolve the same conflict (e.g. developer C merges from feature-A and feature-B into feature-C, developer D merges from feature-B and feature-A into feature-D), you end up in a lot of trouble if you're rebasing, IME.

Now of course it's generally good to avoid long-lived branches as much as possible. But even if your branches are short-lived, being able to freely cross-merge between feature branches makes development much nicer - you can anticipate and resolve or avoid potential conflicts earlier, while they're smaller, rather than hitting them only after a feature is code-complete.

Merging rather than rebasing also makes bisect (especially automated bisect) much more effective - if you rebase a branch that has a semantic conflict with master then it's very common for the rewritten commits to not even compile, so you end up with lots of non-compiling commits scattered through your history. Worse, those non-compiling commits tend to be in long chains, so bisect can only tell you that the commit you're looking for lies somewhere within that chain.

> Sure: given a test command, show me a simple git bisect invocation that finds the merge commit that broke master.

Why would that ever be what you want, and how does rebase make it any easier? In a merge workflow you still have the history to use something like https://gist.github.com/ayust/2040290 if you really want to (and I'm sure a builtin command could be added if there was a convincing use case for one), whereas it's completely impossible in a rebase workflow.

> I think this is a useful criterion, but one that tends to be only clear-cut for things like shrinkwrapped software (and it's not the only thing that matters). If you run a service of any complexity and with any sort of uptime requirements, you will not ship everything together, even if its part of a single feature and often you will have different versions of the same service in production in parallel as well.

I'm not worried about having multiple versions running for a few minutes while a deploy is in process (though I favour having a rule that there can only be one deployment of a given system ongoing at the same time, every deployment must have an owner, and that owner is responsible for leaving the system in a consistent state when they're finished, whether that means finishing the deploy, rolling back, or something else). But if you're happy with having foo deployed on version X and bar deployed on version Y then I'd say that foo and bar are separate systems and the VCS model should reflect that: if anything, if your changes to foo are going to be deployed independently from your changes to bar then I'd rather you had to make two separate commits and think about what the code looks like when one change has been applied and the other hasn't, because that's exactly what's going to be happening at runtime.

> A strange objection. Surely the point of having a super-repo would be that the subrepos at any one commit in the super repo would form a consistent state of the world, rather than you pinning inconsistent versions of different repos in the same commit of your super-repo?

In the case of an internal library there most likely isn't a single consistent version, because different components will depend on different versions of it. More subtle versions of the same thing happen as soon as you have two services that talk to each other but are deployed independently - you can see the code for A and B, but not the code for B as A depends on it. Consistent states exist for things that are deployed in lockstep, but are, I think, misleading for things that aren't.

> At the most basic level you could think about the sup-repos like a pinned (yarn, poetry, budler, ...) dependencies and the super-repo as a lockfile with extra benefits (such as 'git diff HEAD^' presumably showing you all the source changes in the sub repos since the last time you committed their versions in the super-repo).

That sounds like something with quite different characteristics from a git repository. I'm not necessarily against having tools that can deal with multiple repositories at the same time, but I don't think presenting them as a single repository is helpful for that.

patrec · on Dec 24, 2020

> Fundamentally any workflow where you have multiple long-lived branches and want to cross-merge between them (e.g. branches for several mostly-independent long-term features, per-client branches) is impossible with rebase

I absolutely agree with you, if you have long lived branches (for example if you have several supported versions of a product and you need to apply security backports etc) you should use merges. We both agree that long lived branches are to be avoided if possible, so although I further agree that merge is better if you want to integrate across branches, I place no value on that in the scenarios I normally have to deal with (no inherent necessity for long lived branches). I want feature branches to be short lived and integration to happen (predominantly) via master, because that offers significant benefits. For the cases where I really needed something from A in B, rebasing or cherry-picking has not been much of a hassle for me in practice. My feeling is that the majority of open source or commercial projects should not have long lived branches (although some undoubtedly need them).

> Merging rather than rebasing also makes bisect (especially automated bisect) much more effective

Not really: there is exactly one method I have found that works well for bisecting robustly, and this applies equally to merge and rebase based commits: you need to to record which commits have passed CI in some way that's trivial to use for bisect skip (e.g. by CI rewriting the head commit's message to indicate it was tested and passed). This is pretty easy to set up and very useful. If you have a merge based workflow you can also use merges commits to a blessed branch (like master) indicate that there are no broken commits (hence my previous question, how do you bisect just on master merges? What I was getting at is that git annoyingly makes it somewhat awkward to say "skip bisect everything that's not a merge into master"). The only other alternative to get robust bisect is to test all commits via CI, but that's typically not practical for anything but small projects.

> Why would that ever be what you want, and how does rebase make it any easier?

See above: in my experience feature branches that get merged (rather than rebased) will be still be full of intermediate broken commits. If you have set up CI sensibly, your merge commits to master should all be non-broken, so this is a good way to skip unreliable commits. Rebase does not make bisect harder or easier (well, other than that it offers a convenient way to tag CI passes in the commit message), this was just to back up my claim that bisect in git is unnecessarily awkward to use.

> if you rebase a branch that has a semantic conflict with master then it's very common for the rewritten commits to not even compile, so you end up with lots of non-compiling commits scattered through your history.

I'd really like to understand better how you or (more likely?) your co-workers end up in this situation and why you think it's related to a merge based vs rebase based workflow. Linu[sx] use a merge flow workflow but insists on a lot of rebasing to clean up local history, for example. I assume what's happening is something like this: someone has done work on a branch that lived long enough to accumulate a larg-ish number and also diverge enough from master to cause conflicts. When they then rebase master into their branch, they don't both to fix the problems at the commit they occurred, and instead just make the head of the branch work again. So you have a bunch of intermittent broken commits and they pollute history. Is that it? But you can have the exact same thing happen with a merge based workflow, firstly because you can never prevent someone from (badly) rebasing master into their private local branch and secondly because it is also quite likely that someone who couldn't be asked to rebased cleanly but uses merges exclusively would merge master multiplie times into their branch without making the merge commits necessarily pass tests, instead they'd probably also concentrate their efforts on making the head of the feature branch pass.

> But if you're happy with having foo deployed on version X and bar deployed on version Y then I'd say that foo and bar are separate systems and the VCS model should reflect that

Kinda. I think the correct way to handle this is make it painful and requite people to duplicate the code in repo. You sometimes need two versions of X in production in parallel for a longer amount of time (e.g. if you are transitioning to a new architecture, and you need to run it as a shadow system for some time first to gather confidence), but it should not be a common thing and I find having both in the working tree works well for these cases.

> In the case of an internal library there most likely isn't a single consistent version, because different components will depend on different versions of it

That's a big no-no in my book. Maybe there is some scale at which this is the lesser evil because otherwise you make it too painful to refactor stuff, but in general I think you should make it painful for people to do this: all versions should be in master and there should be strong pressure to avoid having more than one version of anything in master, certainly for anything but a short transition periods. There are massive downsides to allowing people to use random versions of internal libraries: security concerns, people ending up on some completely outdated version of a lib that then suddenly breaks completely for them and they don't have the time budget to rewrite all their crap, the massive cost of not being able to look into a single repo and know that if you see "import foolib" it's the foolib in the same source tree. I've seen this being accepted practice and it being not accepted practice, and I found the latter brought big benefits.

I don't think it's necessary for master to correspond 100% to production, but it should be quite close. If you land something in master and it's not shipped by the end of the day, I'd say that's generally a bad thing and you should consider reverting.

> I don't think presenting them as a single repository is helpful for that.

It is helpful because all the normal tooling works. You found a problem in production and would like to see where it came from: if production is either a monorepo or a "super-repo" composed of subrepos, you can just bisect. You can just git log the whole thing to understand recent history etc. You can git grep and it will show you all the uses of something, as opposed you have to figure out how to navigate dozens of different things done by different teams. I have worked on projects with monorepos from the start, multi-repos and multi-repo transitioning to mono-repo. Based on that I think monorepos are great, and wherever possible I would strongly encourage their use: everything being in one place and under a single tool is super-super useful. I think a good way to deal with subrepos should preserve a lot of the advantages of monorepos where monorepos are less applicable, so it's very annoying git has terrible support for this.

lmm · on Dec 25, 2020

> I want feature branches to be short lived and integration to happen (predominantly) via master, because that offers significant benefits. For the cases where I really needed something from A in B, rebasing or cherry-picking has not been much of a hassle for me in practice.

I think having everything close to master offers benefits, but when it means people start doing things like feature flags then that's a higher cost than the benefits, so feature branches need to live long enough to implement a feature enough to tell whether it works - which is probably a week or two even in the best cases. Going from "everything other than master is private, you ignore what other people are working on until it hits master" to "all pushed feature branches are public unless otherwise stated, you pay attention to what other people are doing and pull their branches whenever you think that would be helpful" is a significant change in mentality, but I found it to be the biggest benefit of moving to DVCS - perhaps because resolving conflicts is disproportionately frustrating work, IME.

> you need to to record which commits have passed CI in some way that's trivial to use for bisect skip (e.g. by CI rewriting the head commit's message to indicate it was tested and passed). This is pretty easy to set up and very useful.

In my experience you don't need full CI, because people mostly test the code they've just written at least a little bit. Even if they broke something in one commit, they'll usually fix it in the next.

> But you can have the exact same thing happen with a merge based workflow, firstly because you can never prevent someone from (badly) rebasing master into their private local branch

You can't prevent someone who's determined to put broken commits in their history from doing so, sure. But you can ask everyone to not use rebase and they'll do it, and you can ask them to only commit code that compiles (or passes tests) and they'll do that most of the time (and if you have the occasional isolated non-compiling commit, that's not such a big problem).

> secondly because it is also quite likely that someone who couldn't be asked to rebased cleanly but uses merges exclusively would merge master multiplie times into their branch without making the merge commits necessarily pass tests, instead they'd probably also concentrate their efforts on making the head of the feature branch pass.

The head of the feature branch is the merge commit, that's the whole point. They can't do any more work on their feature until they've at least made things compile - the first time they try to go through the edit-test cycle, they have to fix anything that they broke when merging. So sometimes a lazy person will make a non-compiling merge and then fix it in the following commit, but that's not too bad because it only leaves a single isolated non-compiling commit in the history.

> Kinda. I think the correct way to handle this is make it painful and requite people to duplicate the code in repo. You sometimes need two versions of X in production in parallel for a longer amount of time (e.g. if you are transitioning to a new architecture, and you need to run it as a shadow system for some time first to gather confidence), but it should not be a common thing and I find having both in the working tree works well for these cases.

I'm talking about the case where X and Y are separate systems; if your repository has "the current version of X" and "the current version of Y" then that can end up being pretty misleading because what you don't have in your repo is "what X thinks is the current version of Y" i.e. the version of the interface/client code to Y that X is using.

> That's a big no-no in my book. Maybe there is some scale at which this is the lesser evil because otherwise you make it too painful to refactor stuff, but in general I think you should make it painful for people to do this: all versions should be in master and there should be strong pressure to avoid having more than one version of anything in master, certainly for anything but a short transition periods.

If the library is shared between two systems that are versioned/released/deployed separately then it's necessarily normal to have multiple different versions of the library in production at the same time. IMO at that point it's best for the library to have its own proper release cycle in its own repository and the two systems to have their own repositories, and then each repo's history is an accurate reflection of the thing it's a repository for, but you're prevented from trying to view a combined history of the library and the systems that use it in a naive way, because such a history would always be misleading.

If the library's users are all the same team (i.e. part of the same standup etc.) - then it's usually better to treat everything that uses it as a single system and deploy it all at the same time, and then you can keep them all in the same repository and have a single consistent history. But if the library is shared by independent teams then you can't enforce that versioning and deployment are done together.

> There are massive downsides to allowing people to use random versions of internal libraries: security concerns, people ending up on some completely outdated version of a lib that then suddenly breaks completely for them and they don't have the time budget to rewrite all their crap, the massive cost of not being able to look into a single repo and know that if you see "import foolib" it's the foolib in the same source tree.

> I don't think it's necessary for master to correspond 100% to production, but it should be quite close. If you land something in master and it's not shipped by the end of the day, I'd say that's generally a bad thing and you should consider reverting.

All that's true; in the scenario where you're sharing an internal library between multiple systems you actually need to treat that library as a first-class project with semver, release notes, a security update/LTS policy and all the rest of it. But as soon as your organisation gets too big to deploy everything every time someone wants to release a feature, what do you do? It's now inevitable that you will have different versions of foolib running in production, so IMO the best thing is to have a repo structure that reflects that. I don't think you're actually contradicting what I've said: one repo for each system that's versioned and deployed together, if that system is your whole organisation and your whole codebase then great.

> You found a problem in production and would like to see where it came from: if production is either a monorepo or a "super-repo" composed of subrepos, you can just bisect. You can just git log the whole thing to understand recent history etc. You can git grep and it will show you all the uses of something, as opposed you have to figure out how to navigate dozens of different things done by different teams.

At the point where you decouple the versioning/releasing of different components, the things you want your tools to do become different. You can't check if a method is unused by grepping, because there may be an older version of an intermediate library that's using it. You don't want to bisect in the details of another team's changes, so bisect landing you on either the point where your commit broke or the point where you commit an upgrade of the library that you're using and that broke things is fine, probably better than landing on an internal commit in that library. Even if you do want to do the fix in that library yourself, coming up with a test case for it that's specific to the library rather than using the rest of your system is something you need to do anyway, because that library's test suite should be self-contained. The way of working is different enough that I don't want to use the same interface to it.

patrec · on Dec 23, 2020

What things about the staging area do you dislike at a fundamental level (i.e. going beyond a one-off cost of having to memorize a quirk)?

lmm · on Dec 23, 2020

I dislike that in general, if I've started working on the wrong branch, I can't losslessly switch to a different branch. If I try to check out another branch directly git might (or might not) tell me I need to stash instead. But if I do stash then I lose the distinction between staged and unstaged changes.

patrec · on Dec 23, 2020

I think this should work fine (modulo typical git ui infelicities; but you can certainly make an alias that does it).

How about something along the lines of:

    git stash --keep-index         
    git stash
    git checkout right-branch
    git stash pop --index
    git stash pop

? Untested, so I might have messed up a detail.

Another alternative is creating a tmp branch (again untested) and cherry-picking (with --no-commit) across.

    git checkout -b tmp
    git commit -m 'staged'
    git commit -am 'unstaged'
    git checkout right-branch
    git cherry-pick -n tmp
    git reset HEAD
    git cherry-pick -n tmp^
    git branch -D tmp

andreareina · on Dec 23, 2020

The commit hashes will be different, but I always start a fresh repo with `git commit --allow-empty -m "initial commit"`

patrec · on Dec 23, 2020

Why not got the whole mile, and do something like this:

    GIT_AUTHOR_NAME='nil' GIT_AUTHOR_EMAIL='nil@example.com' \
    GIT_COMMITTER_NAME='nil' GIT_COMMITTER_EMAIL='nil@example.com' \
    GIT_AUTHOR_DATE='1970-01-01 00:00:01' GIT_COMMITTER_DATE='1970-01-01 00:00:01' \
    git commit --no-gpg-sign --allow-empty -m 'NIL'

This will give you an identical commit hash every time. [Edit: added --no-gpg-sign as suggested in reply below]

JNRowe · on Dec 23, 2020

You can ensure the hashes are consistent by setting GIT_AUTHOR_DATE and GIT_COMMITTER_DATE in your initial commit alias.

They still won't be consistent if you've enabled commit.gpgSign though. So, you'd want --no-gpg-sign on that initial commit or someone to reply with a better idea.

patrec · on Dec 23, 2020

Thanks, incorporated your --no-gpg-sign suggestion. I think it's best to use a canonical author/committer as well. Ideally everyone would agree on something so tooling like commit linters (which check org email addresses or commit message formatting) can ignore it and different repos across at least a single org benefit from having the same root commit.

JNRowe · on Dec 23, 2020

The other benefit to your full solution is that you won't get a 50 year sidebar in your GitHub contribution profile too.

Until seeing your suggestion it hadn't occurred to me to use a fake user. However, using UNIX epoch was something I had been doing and I do have the 50 year sidebar ;)

https://github.com/JNRowe#year-list-container

tcoff91 · on Dec 23, 2020

What VCS does a significantly better job on branch merges? All the other ones that I've experienced were horrifying. Merging subversion branches was a nightmare compared to git. Perforce was better than subversion but still pretty bad compared to git.

I've also reverted merges many times in git with no issue. You just choose which parent you're reverting to and then it just works.

The only time I've seen branch merges in git really fuck up is one time when a novice git user at my company unstaged a file from the index before committing the merge. That was a disaster and took me hours to figure out what the hell was going on with that file.

patrec · on Dec 23, 2020

The problem with reverts of merge commits in git is that it doesn't behave like most people intuitively expect. If you merge B into master, find a problem, revert the merge and push a fix to B and try merging B again most devs would expect the result in master to be equal to having just merged the fixed version of B to begin with. But it ain't so.

I agree of course that SVN is much worse than git for most purposes, although it's still used frequently in preference to git in design artifact heavy settings because it won't choke as quickly as git on binary assets from artists or designers. The question of what tool handles merge commits better is a bit academic to me, so I do not know the answer. The two reasons for this are: 1. there is currently no practical benefit in learning a better VCS given the dominance of git 2. I gave up on using merge with git.

bmn__ · on Dec 23, 2020

> by everyone agreeing on a canonical first commit

Make git-insert-empty-root a standard.

https://stackoverflow.com/a/14729121

divbzero · on Dec 23, 2020

Not a full solution to the no-common-root challenge but rebase [1] does provide the option to:

  git rebase -i --root

[1]: https://git-scm.com/docs/git-rebase#Documentation/git-rebase...

amingilani · on Dec 23, 2020

I was recently told by someone that since 2.23 `git switch` lets you switch branches. It was made to address the fact that `checkout` does too many things.

Here: https://git-scm.com/docs/git-switch

_ikke_ · on Dec 23, 2020

And the counterpart to switch is restore:

https://git-scm.com/docs/git-restore

divbzero · on Dec 23, 2020

Thank you, both switch and restore are new to me.

That said, checkout behaves quite predictably once you learn it, so I’ll probably wait for the “THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.” warnings to disappear before adopting the newer commands.

uses · on Dec 23, 2020

To me, restore is one of the most frightening commands in git. It has an innocuous name but it can delete arbitrary amounts of uncommited work without any further interaction, warnings, or confirmation.

dang · on Dec 23, 2020

If curious see also

2017 https://news.ycombinator.com/item?id=13379572

2015 (a bit) https://news.ycombinator.com/item?id=9926330

Discussed at the time: https://news.ycombinator.com/item?id=5511863

geluso · on Dec 23, 2020

I like the one called "The Hobgoblin." The student asks "why don't all these commands follow the same pattern?"

It doesn't make sense to my why the three "view" commands have different nomenclatures, but it /sort of/ makes sense that the two destructive ones have different syntaxes. It would hurt to think you're deleting a remote reference and accidentally deleting a local branch.

What's the point of the Master throwing himself off the railing when the novice runs `git -h branch`?

I'm running it and git responds "unknown option: -h" then displays a short version of the help menu. Maybe it is a joke on showing the short help menu even when using incorrect syntax?

aelred · on Dec 23, 2020

I think it's just another inconsistency: `git --help branch` gives the long help for `git branch`, but `git -h branch` says "unknown option: -h".

boublepop · on Dec 23, 2020

Hopefully someone can correct me if wrong, but I assumed that the joke was that 7 years ago in 2013 if you ran “got - branch” the cli would crash in some way. Though I would love to hear an explanation for someone who knows for sure what the intention of the joke was.

windowojji · on Dec 23, 2020