> most git users picture a branch like a diverging chain of commits and not a mo...

jerf · on Dec 19, 2019

A git repository is best thought of as a sort of tree, where the nodes can potentially have multiple parents. This technically makes them a graph ("directed acyclic" to show I do know the technical terms), but our human intuition will mostly work thinking of them as a tree. To understand multiple parents use your intuition about human beings, since we all have multiple parents.

A "branch" doesn't really have any special status in git. All it is is a pointer to a particular commit. Literally, that is all it is, even in the data structures. It has no additional ontological existence beyond that. When you make a commit to a "branch", all that happens is that git moves the branch marker along to the next commit. All of the "specialness" of a branch is what is in that behavior.

You can do anything you want with that branch marker, such as yanking it to a completely different part of the tree via "git reset --hard $NEW_COMMIT", and while you may confuse humans, you haven't confused git in the slightest. On the flip side, since branches don't actually "exist", it is perfectly sensible to "merge" with a direct commit hash, because that's all a branch is; a pointer to a hash. You can pick up that pointer and drop it down where ever you want.

(A tag is the same thing as a branch, except that when you're "on" a tag and make a commit to it, git does not advance the tag. But almost all it is is a pointer to a commit. Technically it can carry a bit more data, like a signature, but as far as the core structure is concerned it's just a commit.)

I am not the original poster, but I have a training I give at work for git, and it is heavily based on making sure this understanding is presented. (The other reason I wrote my own training rather than reusing any of the many other existing ones is mine makes sure to walk the trainees through the confusing states you can get into, then we explain what happened, why, and how to correctly get out of them. They follow along as we all manipulate a repository, and the training actually spends quite a bit of time in a "detached head" state, checking out, merging, and resetting to commits directly.)

blacksmythe · on Dec 19, 2019

  >> makes sure to walk the trainees through the confusing states you can get into, 
  >> then we explain what happened, why, and how to correctly get out of them

Is this something you could share?

BiteCode_dev · on Dec 19, 2019

- detached HEAD: "git checkout existing_branch" or "git checkout -b new_branch"

- you don't where somewhere and you moved and you can't go back: "git reflog"

- local repo and remote with a different history (e.g: you rebased on a published branch): the whole team to sync with remote except you, then hold. Export your remaining changes as a patch. Reclone. Apply patch.

- remote has a different history than the rest of the team (e.g: you forced push a different history): Delete remote, recreate, repush from one of the team mate, then apply previous solution.

- your messed up your merge and wish to never have done that: "git reset --merge"

- the last commit is not published and you messed it up: "git commit --amend"

- the last commit is published and you messed it up: "git revert HEAD"

But rather than solve problems, better not get them in the first place. Always "git status" before anything, always get a clean working copy before checkout/pull, create a fat gitignore, etc.

jerf · on Dec 19, 2019

Not meaningfully. It isn't written as a blog post document; it's a series of commands and presentation notes, designed to be delivered live by me. You can basically obtain what I have in the document by combining A: what I wrote above B: a good git tutorial, see internet and C: some screwing around with using git checkout, reset, and merge with commit hashes directly on a little repo you use locally.

BiteCode_dev · on Dec 19, 2019

Of course. Someone explained it to me, after all.

In git, a branch is NOT like a wooden branch on the trunk of a tree, although it ends up being at the top of one, which makes the analogy ok.

A git branch is the same thing as a git tag, except it moves automatically when you commit from it.

You can see it as a lightweight label attached to a commit. If your HEAD is itself attached to a branch (HEAD is also a lightweight label, but this one is like a red sticker on a map saying "you are here"), when you do "git commit", the branch label is moved to the newly created commit.

Hence, you can have several branches on a single commit, and you can move branches around: they are just labels. You can also have branches with different names locally and in a remote, or have a branch with the same name, but on different commits locally and in a remote.

A branch is like a reference, it's useful to tell git what place in the history you are talking about, without referring to a commit in particular. It is a moving reference because the place in history you talk about is always changing: it's the top of a chain of commits, the ever changing last thing you worked on for this particular part of the history.

We want branches because they are a convenient way to say "I'm going to do something on the latest commit of this part of the history":

- "git checkout branch_name" => I'm now working on this part of the history, gimme all the latest stuff you got and consider everything I do is now related to it.

- "git checkout branch_name /file/path" => Get the latest version of the file from this part of the history and bring it on my hard drive

Of course, you can put a branch at a commit that is not yet the top of a chain of commits. But soon this commit will become the top of a new chain of commits because when you'll commit from this branch, a new commit will be attached to the previous one, diverging from the original chain, and the branch label will be moved to it. You now have a fork of two chains of commits, both of them having a different branch label at their top:

"git checkout commit_hash && git checkout -b new_branch_name" => I'm going to consider this commit as a new starting point and work from that.

In fact you can move a branch pretty much anywhere, create one or delete one at anytime, including "master", which is not a special case, just one created automatically on the very first commit.

This is also why if you "git checkout commit_hash" instead of "git checkout branch_name", git warns you you are in a "detached HEAD" (the "you are here" label is attached to a commit directly, not a branch label). Indeed, a chain of commits (the stuff that looks like a wooden branch on a trunk in graphics) can exist without a branch label. But it won't be convenient to reference: looking for the hash of the newest, latest commit, every time you want to mention it is tedious. Besides, git will run "git gc" automatically once in a while, deleting diverging parts of the history with no branch or tag, so you may lose this work.

This makes it clear that tags are labels like branches, only they don't move. They serve the purpose of having a static reference to a fixed point in time: both allowing you to easily talk about it, and let git know you want to keep this around.

All that stuff is way clearer with a visual explanation. For my Git training, I bought a fantastic toy for 10yo with magnets that let me build a 3D history tree and use post-its to represent branches and tags. It's instantly obvious. It's really fun, you can move stuff around and show what each command does to the tree. After that, git seems much more accessible, and you just grunt at the terrible UI ergonomics.

thaumasiotes · on Dec 19, 2019

> In git, a branch is NOT like a wooden branch on the trunk of a tree, although it ends up being at the top of one, which makes the analogy ok.

So, there's an isomorphism between (1) a commit node; and (2) the chain(s) of commits ending in that node.

Why then is it important to think of "branch" as referring to one rather than the other? As evidenced by the isomorphism, they're the same thing.

BiteCode_dev · on Dec 19, 2019

It helps to NOT think of a branch as a commit node, nor a chain of commits ending in that node.

A branch is a separate concept, it's not even stored in the history DB but in a different dir. See it as a label attached to a commit, since a commit can have several branches attached to it, or none. A branch will be moved across the history, while a commit will keep his place in the history.

A branch is designed to be easily created, deleted, and moved around, freely, from anywhere to anywhere.

A commit is designed to feel immutable, and unmovable, and although this is not strictly true, this is how you will mostly use it.

A chain of commits is like a rail track, it goes in one direction, each commit composing it never changing (again, conceptually), and never moving. You stack commits, you grow the track, piece by piece.

The branch is more like a flag you put somewhere on the rail track (most often at the end), to let the workers know where to put the next piece.

Picturing it that way let you best use its properties:

- branches cost nothing. They are very lightweight, unlike in SVN, you should create as many as you want.

- moving to a branch is cheap. Apart from moving the files to the working copy, it's just a matter of changing point of view on the history. Switch to branches often, it's fast.

- you can put a branch anywhere you want. You like an old commit and wanna try something new from it? Plant your flag here and start working.

- deleted a branch by mistake ? No worry, it's just a label. You can recreate it in a blink.

- this branch is so good it should be master? Sure you can. Just swap the labels. But let everyone knows :)

Etc.

thaumasiotes · on Dec 19, 2019

> branches cost nothing. They are very lightweight, unlike in SVN, you should create as many as you want.

Ahhhh. So the distinction matters if you bring in a bad assumption from SVN. It had never occurred to me that a labeled branch would be noticeably more expensive than the weight of the commits making up the branch.

> you can put a branch anywhere you want. You like an old commit and wanna try something new from it? Plant your flag here and start working.

Well... yeah. That's how you create branches. You go to the commit you want to branch from, and you run `git branch`. The fact that you can do this is immediately implied by the way you have to do it. I don't understand how someone could believe something different.

> deleted a branch by mistake ? No worry, it's just a label. You can recreate it in a blink.

This one's a little weirder; if you delete a branch, you _can_ recreate it, but it's only blink-of-an-eye easy to do if there's another branch containing the head of the old branch. Otherwise you're mucking around in the internal data.

> this branch is so good it should be master? Sure you can. Just swap the labels.

This is another one where I don't see how you would believe something different. Even if branches were huge, heavyweight objects, they have names, and changing the name of something is generally not so hard to do. There's a command solely for the purpose of renaming branches. (`git branch -m`)

> moving to a branch is cheap. Apart from moving the files to the working copy, it's just a matter of changing point of view on the history. Switch to branches often, it's fast.

And in my mental model, a checkout is indeed a (potentially mass) update to file contents (and, if applicable, file existence). Saying "apart from moving the files to the working copy" sounds -- to my ears -- kind of like the chestnut that "controlled for height, taller men earn no more than short men do". Setting up the working copy is the thing I'm looking to accomplish in a checkout.

I can imagine two types of people who want git training:

- Never used version control.

- Used subversion heavily; trying to update to the new new thing.

In your opinion, how many of these points are "git gotchas" that the first group need to be trained in, and how many are "subversion gotchas" that only really come up for the second group?

BiteCode_dev · on Dec 20, 2019

> In your opinion, how many of these points are "git gotchas" that the first group need to be trained in, and how many are "subversion gotchas" that only really come up for the second group?

Many years ago, SVN users coming to git were indeed a huge source of head scratching.

Not anymore.

Now we just have a lot of people that just don't think in a way that will lead them to say "So, there's an isomorphism between (1) a commit node; and (2) the chain(s) of commits ending in that node.". In fact, they don't know what isomorphism means, nor that the word exists.

Personally, I like to picture git as applied graph theory. One command = a bunch of operations on the graph.

But there is an old git joke that says:

"Git gets easier once you understand branches are homeomorphic endofunctors mapping submanifolds of a Hilbert space".

It is nonsensical (it's a meta reference to a Monad joke), but I think it gets the point perfectly: we need to bring git to the people that needs it, not to ask the people to come to git. I don't care what level of abstraction they are comfortable with, I want them to feel like they can trust their ability to be productive with the tool and feel their project is safe.

And I find that this way of explaining branches is the most universal way for people from all sorts of background to have a decent model of how git works under the hood. So that when fall in a trap, they can use this model and find a pragmatic way out by themself.

Being right used to be my objective when I was younger. Now I just want to be helpful.

steveklabnik · on Dec 19, 2019

> This one's a little weirder; if you delete a branch, you _can_ recreate it, but it's only blink-of-an-eye easy to do if there's another branch containing the head of the old branch. Otherwise you're mucking around in the internal data.

git reflog is very useful for this. Also, when you delete a branch, git helpfully prints

Deleted branch foo (was da2bb5d)

so you already have that information, right there.

BiteCode_dev · on Dec 20, 2019

Was going to say that. reflog should be in every tutorials.

When I started using git, I checked out back in time, then git logged and was baffled to not see my most recent commits in the listing. I panicked, though I lost my work.

Git terrible UI didn't help there: who though it was a good idea to just hide everything with no hint of how to get back at it?

Of course, I could have just git checkout master to get back where at was, I just didn't know it. But that's the point of reflog: if you are not sure what you did or where you are, it's always there for you, it has your back.