> At Sun we had a strict linear history mandate Out of curiosity: Was this also ...

cryptonector · on July 24, 2023

Did you join HN just to comment on my comment? That's funny.

Git-Master · on July 25, 2023

Yes. Usually, I come here only for the linked articles and read some discussions, but don't write. But this was too interesting to not comment, so I had to create an account.

cryptonector · on July 25, 2023

I'll take that as a badge of honor.

cryptonector · on July 24, 2023

> Out of curiosity: Was this also with TeamWare/SCCS?

Yes. It was quite primitive by comparison to git, but it worked.

> > If you care about history, what you really want is to see what's changed over time, and that is always linear. At time t_n you have some commit, and at t_n+1 you have some other commit, and so on.

> Depends on what the history should represent or what you care about specifically, or optimize for.

> What landed in the integration branch or was deployed to a singular test environment is certainly linearily ordered on the time scale. What happened during parallel development is rather a DAG. (And even there you could have different working models: clean up (e.g. with git rebase), then integrate; or keep every commit ever made (even if hidden in presentation), fossil-style.)

At Sun if you had a bug fix you did the equivalent of clone, commit, <any number of fetch and rebase operations>, push.

But for large projects we did something very different. We had a clone of the upstream (OS/Net) for the whole project, and developers would clone the project and work on that. Every few weeks (two or four, depending on the project) the project's gatekeeper would rebase the project clone onto the upstream, and then all the developers would rebase their clones onto the project clone. (Think `git rebase --onto`.)

When a project completed successfully it would rebase one more time onto upstream and push.

These clones functioned as branches. They were archived, but in the upstream there was no evidence left of the clones' existence -- certainly there were no merge commits, as those were verboten.

As you can see, there was no complex DAG in the upstream: it's all linear. The history of the clones was mostly irrelevant, though if really cared (sometimes I did) you could go look at the archives.

> However, with rebasing on top (or squash-merges) you lose the commits in their original context.

Why should anyone care about that "original context"? The project/bugfix delivered. Between the upstream's HEAD as just before delivery, and after, are just the differences contributed by that push, and the commit messages for the commits in that push, and the deltas in each of those commits, are all the context you need.

> You may have no merge-commits, but all your commits are merged commits.

So what?

> (They only have one parent in version history, but the file content is the result of a merge, be it automatic or manual.)

This is like caring about seeing every typo or braino the developer made while working on that code. It's just not useful. Once the code is pushed upstream the conflict resolutions that the author had to do will never again matter to anyone else.

If you wanted to investigate what went wrong in a mishap -a mismerge- you can: examine the downstream clone's history (including reflog). If nothing went wrong, you're done. You can archive those downstream clones (or the relevant bits of their reflog), which, again, we did at Sun precisely for this reason.

If you use a merge flow, those merge commits are forever, and they will forever pollute the history and complicate rendering of history, and they will remain long after anyone could possibly care about what might have gone wrong in a mismerge.

> This may no big deal for things you can and do test for. If finding out a bug later however, it is often easier to comprehend, if one can see or test, if that happened only while integrating the changes. Then you have at least still the working version from the tip of the branch available for comparison.

Ok, so you're concerned about investigations. Again, they're rare, and there's a way to do them without polluting the upstream's history.

Perhaps there could be something like merge commits that aren't commits but annotations that are not part of the Merkle hash tree and which can be deleted, or which are in the Merkle hash tree but hidden, and where if you want to find out the details of what the developer did you still have to go dig in their clones. But IMO that's still mostly pointless.

Git-Master · on July 25, 2023

Thanks a lot, that was quite interesting! Always interested in different ways of doing things and the reasons behind them.

Just to add to your last point regarding hiding history details: that seems to be fossils way of doing it[1][2], if needed/wanted. The history is still in the repo, but the presentation can be cleaned up. (But this part of fossil I only know from the documentation, I have not used it that much.)

1. https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

2. https://fossil-scm.org/home/help?cmd=amend

cryptonector · on July 25, 2023

Quoting from #1:

> Git puts a lot of emphasis on maintaining a "clean" check-in history. Extraneous and experimental branches by individual developers often never make it into the main repository. Branches may be rebased before being pushed to make it appear as if development had been linear, or "squashed" to make it appear that multiple commits were made as a single commit. There are other history rewriting mechanisms in Git as well. Git strives to record what the development of a project should have looked like had there been no mistakes.

Right, exactly, because the mistakes often aren't interesting. Where mistakes made in development were interesting, then you should write up the interesting details in you code or commit commentary or other documentation.

> Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth. One might argue that this makes the history of a Fossil project "messy," but another point of view is that this makes the history "accurate." In actual practice, the superior reporting tools available in Fossil mean that this incidental mess is not a factor.

But that's just so much noise! Some devs will make much more noise than others. Devs who commit often, for example, will make more noise. Noise noise noise. If I'm spelunking through the history I don't want noise -- I want signal. Even if I'm looking through history to find out who is a better developer it may not be helpful because some devs will commit more often.

I mean, why not just record your interactive sessions and then play them back when you want to look at the history? Where would the madness stop?

IMO the madness stops at the upstream, where only linear history should exist.

And again, if I want to look at a project's internal history I can look at their archived clone of the upstream. Even in project "gates" (as we called them at Sun) we kept linear history (some projects had very long history, like ZFS and SMF for example). Very few people looked through archived gates -- I did, from time to time, especially the old SEAM (Kerberized NFS) gates from 1999-2000, but not only. Some archives did get lost -- especially really old ones, the ones that matter more to historians than to current developers, so that sort of loss is a problem, but not one that threatens the ability to do work.