Git from the bottom up [pdf]

10ren · on Feb 4, 2010

Linus is a hacker, not an academic, so there is no clean high-level abstraction of git; no algebra of git. OTOH, it is ridiculously fast and ridiculously useful (for Linus, and for anyone who goes to the trouble to understand its execution model).

I think there'd be a paper or two in attempting to infer an algebra of git. There wouldn't be a clean one, so you could propose modifications of git to facilitate a clean algebra, in the way that physicists and biologists hypothesize laws to explain the observed universe.

etherealG · on Feb 4, 2010

I disagree, this paper outlines the algebra quite clearly. A series of commits in Directed acyclic graph layout, with the ability to modify this graph however you please.

10ren · on Feb 5, 2010

Yes, that's the underlying execution model. The approach of the article "git from the bottom up" is that one needs to understand this in order to use it, rather than being able to understand it in terms of the UI. Taking a lateral step towards the ease-of-adoption and learning of a tool, there's an intriguing aspect of git that I want to articulate further:

To understand part of git, you must first understand all of it.

Several aspects of git are interconnected; some design decisions don't make sense in isolation; and you need to know the detailed and apparently incidental behaviour of some commands in order to be able to use them. It's an expert tool.

As an ideal (perhaps unrealizable), a tool has a learning curve, with a series of "closed" subsets of functionality, such that each subset is complete in itself. Learn a subset, and you are a master of that aspect. (I'm using "closed" not in the strict mathematical sense of the results of operations being in the domain of their operands, but to mean that you don't need to go outside that subset). It's a similar concept to minimizing coupling between modules.

e.g. you can play UT without knowing about alternate fire; you can using basic reg exps without knowing all the weird clever stuff; you can write C in C++. Languages especially have this quality - you can write procedural code in an OO language, you can use one library without using all the libraries.

Git doesn't do this.

As an example, if you want to change the message of a commit, you need to create a new commit with the new message. This immutability is important (I infer) so that other instances of the repository can remain in sync (the new commit will have a different hash from the old one (with a very high probability), because the message is different). Thus, the apparently simple operation of "changing a message" is interconnected with the distributed nature of the tool.

If you want to change the message of the first commit (at the root of the DAG), it is more complex. First you 'change' the message of that first commit, via making a copy (as above) - but the rest of the DAG is still pointing to the old root, not your new one. Therefore, you rebase (disconnect and reattach a node) the child from the old root to the new root. Finally, rebase does not actually disconnect and reattach - it makes copies (immutability again). You haven't changed the old DAG, but created a new one.

Changing a message is not as simple as one might expect.

An additional problem is that the documentation (man pages) doesn't always define terms, and not always clearly. But they are obvious to someone who already knows how it works (and doesn't need the man pages, except for reference). BTW: I found that some terms are defined at the end of the options section (the definitions aren't always complete or unambiguous, but they help); and it's helpful to think of the type of the arguments (though the man pages don't use that term)

Please don't take this as a disparagement of git - it is an expert tool, tackling and solving some extremely difficult problems of a master. I understand it took Linus years of coping with the problem, and of having the brilliant example of bitkeeper before he could whip up git overnight. I've invested quite a bit of time to reach the (limited) understanding that I have so far, and each step teaches me more about the problems and solutions of serious distributed version control. It is complicated because the problem is complicated; and it remains complicated because a master already grasps the problem, and can cope with that complexity.

etherealG · on Feb 9, 2010

your example is a great point, and something that just isn't practically possible. Changing history is impossible once it's been pushed without making chaos for anyone else that uses your code. And in git, commit messages are part of history, not just annotation to it.

perhaps in other ways though, the closed subsets do exist. as long as what you want to do isn't "complex" in git. things like commits, pushes etc.

Most importantly I would say is the fact that branching merging and sharing that branched history is a closed subset. If you only ever use the commit/push/fetch/merge commands (with pull being the same as pull+merge), and perhaps local branching as well, then you have a relatively simple tool for sharing code and history of that code. by far you don't need to understand the plumbing to work within that work flow, and I think I could explain it to someone in a short hour or so of explanation and example.

fdb · on Feb 4, 2010

It took a lot of time for me to really get Git, and reading this document I finally got it. Genuinely recommended.

I also like PeepCode's "Git Internals" http://peepcode.com/products/git-internals-pdf .

oozcitak · on Feb 4, 2010

Being new to git I found this document very helpful. git stash was new for me, and I am glad to have learned it.

tyrmored · on Feb 4, 2010

This looks very helpful. I've had some trouble migrating from a GUI Subversion client to command-line Git.

maurycy · on Feb 4, 2010

Everyone I know seems to have. :-)

Personally, I'm a bit disgusted with the git hype. I love the git idea but the interface is horrible.

pyre · on Feb 4, 2010

There are some nice-looking Git clients, but most of them seem to be for OS X. I haven't used them though as I don't have a Mac and I prefer the command-line client.

Part of the problem is that there is not 'git library' last I checked. Git started out as a loosely strung-together set of commands that vary from compiled C-code to Perl scripts to plain shell scripts. Over time the bottlenecks have been ironed out and ported to C and such, but git is under heavy development even now. Just follow the git mailing list for a while to see.

Also do some searching for stuff about JGit which is a git implementation in Java. Last I heard they were having a lot of trouble getting the performance to match the actual git tools because most of the bottlenecks had been fixed in the 'official' tools using highly optimized C-code.

{edit} I guess I should add that the JGit stuff is information I read somewhere, but I don't recall if it was in a blog post/comment/mailing list email/etc.

maurycy · on Feb 4, 2010

I'm used to the command line interfaces. It is not that I need a GUI tool. I do not. It's just that, after a year, I'm still not 100% sure that I understand what every git command does. There's still something under the hoods going on that I do not trust. I do not want to use a GUI interface, which I trust even less.

Of course, you can either ask specific questions what causes me problems and answer them, or just reply me that powerful tool has a steep learning curve.

I reject this notion, though. Perforce, Darcs and Bazaar are similarly powerful tools, with their own trade-offs. None gave me so much headache initially, and relatively quickly I've got some level of comfort with them.

As for lack of "git library", I think you nailed it, if it's still true. +1. Lack of proper encapsulation is one of the biggest issues in the Linux tools' world. Most of the tools is "as is", and the only way to interact with them is the command line.

I know that this is a part of the Unix philosophy. It frequently, though, generates similar problems. The idea is excellent, yet the interface is horrible. It shouldn't be the case. Why give up excellent, already written, code and idea, because of something so trivial as interface?

It's especially visible in the world of package management software. Nearly all come as a package: their own format, their own repos, their own scripts, and the command-line interface. I still don't understand why there's neither single packages' format, nor scripts cannot be reused, and everyone reinvents the wheel on daily basis.

By the way, thanks for not downvoting me. I expected a different outcome.

pyre · on Feb 4, 2010

> It's especially visible in the world of package management software. Nearly all come as a package: their own format, their own repos, their own scripts, and the command-line interface. I still don't understand why there's neither single packages' format, nor scripts cannot be reused, and everyone reinvents the wheel on daily basis.

People aren't 'reinventing' the wheel constantly. There are only two major package management structures (deb/rpm) and there are even tools to translate between them, IIRC. Why 'their own repos?' Because if they make design decisions that are incompatible with the current repository, they have to host their own. It's not like Ubuntu and Debian are mutually exclusive as far as repositories go. Ubuntu pulls over a 'snapshot' of Debian's unstable repository, applying patches where they need to and/or see fit. There are probably plenty of packages in Ubuntu that are unchanged from what they were in the Debian Unstable repository.

I'm sure it's the same with Ubuntu-based distros like Mint. They probably pull over a snapshot of the Ubuntu repos, make their modifications and it becomes their next revision of Mint.

> Bazaar

Bzr has its own set of headaches. I haven't used it extensively, but IIRC they use revision numbers on commits that are only valid locally, which I think is a big no-no. Especially with people coming from other version control systems that use the revision number as the canonical pointer to a specific revision.

davepeck · on Feb 4, 2010

By interface, I assume you mean git's command-line interface? I agree; it's terrible.

My impression is that projects like grit are far enough along that it should be possible to build an entirely new git front-end/porcelain on a clean technology stack. Something with more sane exposure. I don't know if anybody's actually working on something like this right now?

mbrubeck · on Feb 4, 2010

David Roundy (author of Darcs, a DVCS that predates git and has one of the nicest command-line interfaces) is working on a git porcelain that has the same user interface and semantics as Darcs:

http://github.com/droundy/iolaus

davepeck · on Feb 4, 2010

Thanks for the pointer -- I played with Darcs back in the day. This looks interesting.

markkoberlein · on Feb 4, 2010

I actually like the command-line interface and prefer it over the GUI interfaces. However, I do prefer the command line for almost everything from managing databases to editing source files in VIM. Git's command-line is a natural fit for that kind of workflow.

dasil003 · on Feb 4, 2010

I prefer the command line as well for version control (regardless of flavor), but git's porcelain is pretty horrendous. I think the worst offender is git-checkout, which can switch branches, detach from a branch entirely, or just update the working copy of a file with a different revision.

However despite the power and complexity of git's interface I find I better understand and feel more comfortable with git after one year of daily use than I ever was with subversion and cvs even after a decade.

doki_pen · on Feb 4, 2010

It's not that bad. I never enjoyed using Tortoise and always stuck with cli svn. git's tools set is far superior to svn's.

kajecounterhack · on Feb 4, 2010

what about gitk (*nix) / gitx (mac) ? granted it's not really a gui so much as a visualization...