This. So much this. I hate looking through history and seeing crap like "lol forgot semicolon". Rebasing when you're still in your feature branch before the code hitting master to make your commits succinct, readable and above all not contain known broken code is a must.
Why are you committing code you haven't even tried to build? In the scenario you're presenting, the problem is that somebody even needed a "lol forgot semicolon" commit in the first place. Stop doing that. We all make mistakes and this will come up sometimes, but if it's happening so often that you need to rewrite your VCS history to stop from annoying other people, something is wrong.
>Why are you committing code you haven't even tried to build?
Because a DVCS tool like Git makes commits much less costly than older tools such as CVS or SVN. The dynamics (both social & personal) for commits are different.
My guess is that you understand Git commands but you're using the SVN/CVS mental model of treating commits as "sacred" markers. If someone commits in those older centralized systems, they could potentially break the build and stop the team's productivity. This leads to strange social dynamics such as programmers "hoarding" their accumulation of code changes over days/weeks and then they later end up in "merge hell".
Because Git "commits" have a private-then-public phase, the programmer does not have to be burdened with affecting others' productivity with their (sometimes spurious) commits. They can have twitchy trigger fingers with repeated "git commit". The git commits can be treated as a personal redundant backup of Ctrl+S (or vi ":w"). (Or as others stated, the git commits and private history become an extension of their text editors.) They don't have to hoard their code changes. Because of the different dynamics, they don't necessarily have an automated Continuous-Integration complete rebuild of the entire project triggered with every commit. To outsiders however, many of these commits are just "noise" and don't rise to the same semantic importance that we associated with CVS/SVN type of commits.
In this sense, "rebase rewriting private history" does not mean faking accounting numbers like "Enron fraud" and throwing auditors off track, but instead, it's more like "hit backspace key or Ctrl+Z and type the intended characters."
In CVS/SVN, the "commits" are a Really Big Deal.
However, in Git, the "commits" are Not a big deal and closer in concept to a redundant "Ctrl+S". It shifts the Really Big Deal action to the act of "applying changes or merges" (e.g. "patches" is how Linus Torvald's often describes it.)
I wouldn't go so far as to say that they're sacred, but I do think you're right that a disagreement over their relative importance is probably at the core of this.
However, I think the stuff about breaking the build is way off. If one were really fearful of any commit breaking the build, wouldn't one embrace rewriting history? You'd try to avoid making a breaking commit in the first place, but if you're fearful of breaking builds, then once you did make such a mistake, the ability to go back and rewrite it would surely look pretty good.
One of the big advantages of git as I see it is that you don't have to be fearful about bad commits. You made a commit that broke the build? Well, try not to do that, but as long as you don't push it, it's not a big deal. Fix it (in a new commit!) and you'll push both of them together. History is preserved, nobody's build actually broke, everybody's happy.
>, but if you're fearful of breaking builds, then once you did make such a mistake, the ability to go back and rewrite it would surely look pretty good.
But I was trying to emphasize that Git's "mental model" eases the burden breaking the build. If everyone buys into the concept that "git commits" are just another lightweight form of "Ctrl+S", we would expect for programmers' private branches to sometimes have broken builds. That's the nature of real-world work such as refactoring or experimental changes. There's no social penalty or stigma for broken builds in private repos. Therefore, if a programmer rewrites history to hide broken builds, it's not because of ego or image-consciousness but because of consideration for others to read a comprehensible story of the changes.
You made a commit that broke the build? Well, try not to do that, but as long as you don't push it, it's not a big deal. Fix it (in a new commit!) and you'll push both of them together. History is preserved, nobody's build actually broke, everybody's happy.
Not everybody's happy. If we conceptually treat git commits as a 2nd form of "ctrl+s", we don't want to see both commits. Instead, clean up your private history, then craft/squash/edit your commits into a logical story, then make sure your public history has a clean build, and then apply those commits to the public branch. That's the way Linus Torvalds likes it for Linux patches and many agree with him. We do want some history to be preserved but not all of it.
When you say it's another form of ^S, how often are we talking here? I reflexively ^S every couple of words, are you literally talking about committing every couple of words? Every few lines? Less? What's the purpose committing more often than logical chunks of code which can be considered in some sense "done"?
This is somewhat different from the parent's view, but personally, I try to turn the list of commits in a given PR into a readable, reviewable "story" of the general steps that need to taken to implement a feature or fix a bug. (This starts when first writing it, because splitting up changes after the fact is a nightmare.) However, I do not want to limit myself to finishing and polishing one step before proceeding to the next. For one thing, my intuition might turn out to be wrong and the overall approach I'm aiming for might not be a good idea at all, something which I might only figure out when trying to implement the final feature/fix on top of the foundations. Or it might be a good idea overall, but I might end up realizing later that, say, out of the code I was fixing up in a previous commit, a lot of it is going to be removed by a later step in the refactoring anyway, so I should probably merge those steps or otherwise shuffle up the order. For another, I will probably just end up making mistakes, some of which I'll notice myself and some of which may be noticed in code review; while the "story" is primarily for code review, it is also useful for bisecting, so even changes found in review are good to integrate into the story.
As a result, when working on the project I'm thinking of, I use git rebase -i constantly, as if each commit were a separate file and I were switching between them in my editor. However, I don't actually like that old versions of my code are being thrown away (aside from reflog); I'd prefer if Git had two axes of history, one 'logical' and one 'real' (even if that gives people who already don't like branchy histories nightmares). I hear that Mercurial has something like this called "changeset evolution", but I haven't tried it; wish someone would make a Git port.
Why not just decrease the autosave interval in your editor :)
>What's the purpose committing more often than logical chunks of code which can be considered in some sense "done"?
There are different degrees of "doneness". For example, (1) code that isn't finished but you don't want to lose it if the power goes out, (2) code that you're not sure if you're going to keep, but you'd like to be able to refer back to it even if you later decide to change it, (3) code that completely accomplishes its purpose in a logical and coherent manner.
I use "Ctrl-S" for (1), "git commit" to a local branch for (2), and "git rebase/git push" for (3). Maybe I'm just a sloppy programmer, but my workflow often involves writing some code, making certain changes, then realizing that what I really need is the previous version but with different changes. So for me, frequent commits on a local branch have replaced frequent saves under different filenames (foo.c, foo_old.c, foo_tried_it_this_way.c)
My ^S reflex is almost 30 years old. It costs nothing, and occasionally saves me, so I have no reason to fight it. Autosave is great, but every so often you'll hit a situation where it turns out that it's not firing (misconfiguration or something) and then you're doomed. Belt and suspenders is best.
As for the rest, that's interesting stuff to ponder.
Can't speak for the GP, but I often commit my changes every 3-4 minutes with messages like "Tweaked the padding." Then when my work is in a reasonable state to be viewed by someone else, I'll turn those 5-6 local commits into one coherent "feature commit" like "Redesigned the page header according to new brand guidelines."
Because people make mistakes? Not all test suites are 100% perfect so bugs go missed? That's the point of a pull request and code review. I get that mistakes happen but we don't need a record of your mistake and subsequent fix in master (unless it's already in master. Then that history is sacred). Just squash the appropriate commits (--fixup is my favoritist thing ever) before merging to master and send the right code to public history the first time.
Can't reply to mikeash below, but I also have a comment. I've burnt myself a few times where I committed something and pushed to my remote repo, only to realise that I shouldn't have.
What I've taken from my errors is that I no longer push single commits until I'm at least done with what I'm doing (I use GitFlow btw).
It's easier for such things to happen in languages where you don't need to build your project (looking at JavaScript). Sometimes it's pushing a fix, only to realise that it introduces a regression somewhere. I know that testing takes care of most of this, but not everything can have tests written for. I'm a single developer on my 'hobby' start-up, working on over 4 separate components, I unfortunately can't write tests for all of them at this stage.
Even in a language like JavaScript, you're at least running your new code before you commit, surely.
As for a fix which introduces a regression somewhere else, that seems like exactly the sort of history you'd want to capture in source control. "Fixed X." "The fix for X broke Y, which is now fixed." This is much more informative than a single "Fixed X." which only includes the final state. The fact that a fix for X could break Y is valuable information!
Yes, I run it, but if out of the possible 5'000 combinations that I go through when searching (https://rwt.to, essentially an A-B public transit planner, https://movinggauteng.co.za - data behind the planner) one of them breaks, it becomes difficult to spot errors until a few days later at times.
I could write something that does searches for as many combinations as possible, but I'm at the point where the cost of getting an error is better than spending a day where I can't work on my code because tests are running (the data changes regularly). That day's often a weekend where I've got a small window of time to work on my hobby.
On your last point, I often end up being detailed on my commits where I can fiddle with the history before pushing to remote, so I still end up capturing what happened in SC.
I'd really love a suggestion on how I could get around this, it would help me improve (I'm an accountant by profession, but do some SAS, R, Python, JS etc. as part of my ever-changing job).
I don't see the problem with making a change, breaking something that's not practical to immediately test, committing that change, noticing the breakage a few days later, and committing a fix. No need to rewrite history, just have "fixed Y which broke when I did X" later on.
In JavaScript you have to contend with a dozen different run environments. Maybe you realized your fix actually broke the feature in IE8 because you left a comma at the end of an array. It's quite common to have your fix break something in a very specific environment.
That's fine, but then I don't understand what the problem is with having that IE8 fix which you didn't make until sometime later being a separate commit.