GitLab CEO here. I agree that GitFlow is needlessly complex and that there should be one main branch. The author advises to merge in feature branches by rebasing them on master. I think that it is harmful to rewrite history. You will lose cherry-picks, references in issues and testing results (CI) of those commits if you give them a new identifier.
The power of git is the ability to work in parallel without getting in each-others way. No longer having a linear history is an acceptable consequence. In reality the history was never linear to begin with. It is better to have a messy but realistic history if you want to trace back what happend and who did and tested what at what time. I prefer my code to be clean and my history to be correct.
I prefer to start with GitHub flow (merging feature branches without rebasing) and described how to deal with different environments and release branches in GitLab Flow https://about.gitlab.com/2014/09/29/gitlab-flow/
The obsession of git users with rewriting history has always puzzled me. I like that the feature exists, because it is very occasionally useful, but it's one of those things you should almost never use.
The whole point of history is to have a record of what happened. If you're going around and changing it, then you no longer have a record of what happened, but a record of what you kind of wish had actually happened.
How are you going to find out when a bug was introduced, or see the context in which a particular bit of code was written, when you may have erased what actually happened and replaced it with a whitewashed version? What is the point of having commits in your repository which represent a state that the code was never actually in?
It always feels to me like people just being image-conscious. Some programmers really want to come across as careful, conscientious, thoughtful programmers, but can't actually accomplish it, so instead they do the usual mess, try to clean it up, then go back and make it look like the code was always clean. It doesn't actually help anything, it just makes them look better. The stuff about nonlinear history being harder to read is just rationalization.
The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented. This means splitting it into commits (which change one thing), giving them good commit messages (describing the one thing and its effects), and putting them in the right order.
Rather than hiding bugs, usually I wind up finding bugs when doing this because teasing apart the different concerns that were developed in parallel in the hacking session (while keeping your codebase compiling/tests running at every step) tends to expose codependence issues that you wouldn't find when everything's there at once.
It's basically a one-person code review. And when you're done you have a coherent story (in commits) which is perfectly suited for other people to review, rather than just a big diff (or smaller messy diffs).
It also lets me commit whenever I want to during development, even if the build is broken. This is useful for finding bugs during development as you'll have more recorded states to, i.e., find the last working state when you screw something up. And in-development commits can be more notes to myself about the current state of development rather than well-reasoned prose about the features contained.
I realize not everyone agrees with it, but I hope I've described some good reasons why I think modifying history (suitably constrained by the don't-do-it-once-you've-given-your-branch-to-the-public rule) is a good thing, not something to be shunned.
I like "Rewriting local history seems no different than rewriting code in your editor", that's a pretty good analogy I hadn't thought of.
There are a (very) few instances where you'd want to rewrite something pushed to a shared repo. One is if there's a shared understanding that that branch will be rewritten. Some examples would include git's own "pu" and "next" branches. "pu" is rebased every time it changes, and "next" is rebased after every release. Everyone knows this and knows not to base work off these branches. There's also the occasional "brown paper bag" cleanup like some proprietary information got into the repository by mistake and all the contributors have to cooporate to get it removed. But all of these take out-of-band communication somehow.
We've been fine using rebase on already pushed branches. This comes from the understanding that a feature branch belongs to one developer, ever, and that no one else is supposed to work off of it (or at their own peril).
Everyone knows that it's "my branch" and that they're absolutely not supposed to use it for anything until it's merged back into master or whatever authoritative branch.
For me, it's because I hop between development machines, and pushing/pulling a branch is much easier than the alternative of synchronizing files manually among said machines.
Also, so that if something goes awry with my dev machine for whatever reason, at least my work is saved.
Also, to make it easier for a colleague to review my code before it gets merged into something.
Also, becuase it means I can use GitHub's PR system instead of doing it on my machine (thus providing some additional record that my code got merged in, and providing an avenue for the merge itself to be reviewed and commented on).
We have a rule that you never go home at night without pushing your work, even if it's garbage. Put it in a super-short-term feature branch if needed, and push that, but don't leave it imprisoned on your machine.
I kinda hope you are, because backup and source control really should be separate functions. Obviously your source control repository should be backed up, and pushing stuff into it acts to create a backup, but you really should have a separate backup system at work as well, to cover unpushed code as well as all the other useful info contained on your computer.
I use it the same way too. I do not really see why backup should be separate from source control as there is no valuable information on my (work) computer apart from the source code, and I never spend more than a few hours without pushing.
Does anyone advocate rewriting shared history? Oddly I see this "exception" a lot in reply to this person but I'm not sure I ever read anywhere anyone saying rewriting shared history is a good idea.
I think its less people saying you should rebase shared history, and more people saying you should rebase without realizing shared history matters. Then some poor confused soul starts always rebasing before pushing/merging and they mess up their local history and do not know how to fix it.
A lot of git is "magic" to many developers, and the way that rebase works is certainly one of the features poorly understood.
Only in extreme circumstances where something sensitive (such as credentials) or otherwise (such as other people's copyrighted assets, or .svn directories in the case of some repos that were moved from SVN to get in a hamfisted manner) was checked into the repository and needs to be removed. Those are the only reasons for rewriting shared history.
My rule of thumb is that rewriting shared history is always, always bad. There may be situations where the proper precautions can mitigate the risk, but I've never seen a good example where it's actually a completely good idea without downsides.
> Rewriting shared history is (almost) always bad.
Agreed. The one counterexample that I have is Github pull requests. Those are actually branches in your fork, and you do want to rewrite those when you get feedback on a pull request. That makes it easier for the owner of the repo to do the merge later.
I will get pull requests where later commits fix bugs introduced in former commits.
I generally ask people to rewrite such PRs, as I’m not going to pull known buggy commits into master, even if they are followed by fixes. That is just noise.
It might also be that some commits in the PR has changed tabs to spaces or vice versa.
I think the point was: if you have a PR with two commits, you can squash it to a single commit and force push. This will update the PR to just have the single commit. (Similarly with a rebase.)
I've came across reasons, but they've always been pretty marginal, such as somebody checking in sensitive credentials without realising what they were doing.
The thing is, the commit message is part of the commit, not something separate from it. Irritating as it might be, this is good for traceability.
What I do to avoid that is work on a separate branch, rebase against master, then review the commits on my branch after getting rid of any WIP commits and shuffling them around to make more sense. Finally, I make sure the commit messages are (a) accurate and (b) have no typos. Once I'm satisfied with that, I merge.
I treat merging as a big deal, but not committing.
Agreed, most people hear "rewrite history" and immediately assume "public history".
Rebase is a part of code review. If someone spots a typo and a "fix typo" commit follows it up as happens for a good proportion of GitHub model projects, I cringe. This information is uttery useless to the projects history, and should be rebased as a fixup. Only once code review is done, should a commit be considered for merge. It's at this point that rewriting becomes a problem.
I think most people forgot where Git came from, git is designed from the ground up for this! When someone emails a series of patches to the kernel mailing list for review, they iterate that series of commits over and over until its ready. They don't keep adding new patches on top like the Pull Request model proposed by GitHub/GitLab etc do.
In my Github experience, rebasing/tidying your commits is expected before a Pull Request is merged, just like your description of Linux development. Eg, the numpy/scipy/matplotlib projects.
Unfortunately, this is not true for many repositories. GitHub's interface (i.e., the "Merge" button), encourages users to merge from the web interface, where this tidying can't happen.
Then someone else rebases over that commit, there's a conflict and lo! the tests fail. Why? typo. It's fixed in the subsequent commit (which you can't see). Lovely.
There's something to be said for having every commit pass tests/work (or if it doesn't saying explicitly in the commit message), if anyone is ever going to step over this commit.
That's a hard one; trying to make a single commit in a pull request helps me but sometimes even then a pull request gets ignored and they want me to rebase it.
The problem is they ask /me/ to rebase it; I think they should take a little ownership in the potential rewriting of history.
Another nice side benefit is that you are able to use git bisect to find bugs more easily. If some of the commits fail the build then it becomes difficult to separate commits that actually introduce a bug from those that are just incomplete.
The team I work with has recently started making sure every commit passes the build and it's had some fantastic results in our productivity. We know every individual commit passes on it's own. If we cherry-pick something in that it's most likely going to pass; so if it fails then usually the problem is in that specific commit, not one made days or weeks ago.
You don't have to rewrite history to do this. You just have to run your tests before committing. You know, like people used to in the old days.
Indeed, i think the widespread rewriting of history that goes on in the Git world makes it more likely that there will be failing commits, because every time you rewrite, you create a sheaf of commits which have never been tested.
Now, in your case, it sounds like you have set up processes to check these commits, and that's absolutely great. Everyone should do this! But why not combine this with a non-rewriting, test-before-commit process that produces fewer broken commits in the first place?
Running test before committing locally adds a lot of friction. It often happens to me that I work on a feature in component A, and in doing so, realize that it would be great to have some additional feature in component B (or perhaps there's a bug that needs to be fixed).
As long as the components are logically separate, it's usually a good idea to make those changes in separate commits. While you can do that using selective git add, I personally often find it more convenient to just have a whole bunch of rather small "WIP" commits that you later group and squash together in a rebase.
Not least of the reason is that I like to make local commits often in general anyway, even when I know that the current state does not even compile. It's a form of backup. In that case, I really don't want to have to run tests before making commits.
And obviously, all of this only applies to my local work that will never be used by anybody else.
When you come up with the idea for a feature in component B, or a bug to fix, rather than implementing it, make a note of it, and carry on with what you were doing. Once that's done and committed, you can go back to the other thing. That way, you end up with coherent separate commits, that you can test individually as you make them, without having to rewrite history. Not only that, but you can give each commit your full attention as you work on it, rather than spreading your attention over however many things.
Again, this is the traditional way of doing things (as an aside, in pair programming, one of the roles of the navigator is to maintain these notes of what to do next, so the pair can focus on one thing at a time). Seen from this perspective, history rewriting is again a way to cover up poor, undisciplined programming practice.
It's possible that we just have different styles of working.
Still, to clarify: Not all, but some of the situations I have in mind are situation where the changes in component A cannot possibly work without the changes in component B.
So an alternative workflow could rather be: Stash all your changes made so far, then do the changes in component B, commit, and then reapply the stashed changes in component A. That's something I've tried in the past, and it can work. However, it has downsides as well. In particular, having the in-progress changes in component A around actually helps by providing the context to guide the changes in component B. So you avoid situations where, after you've continued working on component A, you realize that there's still something missing to component B after all (which may be something as silly as an incorrect const-qualifier).
It's also possible that our preferences depend on the kind of projects we're working on. What I've described is something that has turned out to work well for me on a large C++ code base, where being able to compile the work-in-progress state for both components simultaneously is very useful to catch the kind of problems like incorrect const-qualifiers I've mentioned before.
I could imagine that on a different type of project your way works just as well. For example, in a project where unit testing is applicable and development policy, so that you'd write separate tests for your changes to component B anyway, being able to co-test the work-in-progress state across components is not as important because you're already testing via unit tests.
I agree that the situation where you need the changes in B to make the changes in A is both genuine and annoying!
I have often taken the stash A - change B - commit B - pop A - finish A route. If you know what changes to B you need, it's fine, but you're right, the changes to A can be useful context.
In that case, you can make the changes to B with the changes to A still around, then stash A, run the tests, commit, pop A, and continue. Then you can have the best of both worlds, and you still don't need to edit history.
If you just can't make the changes to B without the changes to A, then they probably belong in a single commit, and you've just identified a possible coupling that needs refactoring as a bonus.
Yeah, obviously we do that (well maybe not so obvious to some, but I never push unless the tests pass). We sometimes perform lots of other things like static analysis that get in the way of a rapid feedback loop. We also run mutation testing, which can sometimes take several hours for the whole codebase -- although we don't have this run on every commit, just ones that we merge into a specific branch.
The problem I have with non-linear commit history is that I find it impossible to keep all the paths straight in my head when I am trying to understand a series of changes. Maybe you can do that, and I think that's awesome, but I like to see a master branch and then smaller feature branches that break off and then combine back with master.
> The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented. This means splitting it into commits (which change one thing), giving them good commit messages (describing the one thing and its effects), and putting them in the right order.
To my understanding, Gerrit does grouped commits as part of the flow. Even better, groups all review-triggered commits under the same master commit, with the nice, extensive description that one carved for the PR. It's regrettable that GitHub popularized fork/pull request model instead.
> The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented.
Isn't this the same rationalization that drives Git Flow's feature branches and merging via --no-ff ? You can see the messy real work in the feature branch, but it gets merged to the main branch as one clean commit.
Once the merge commit occurs, the 'messy real work' is now part of the main branch's history just as much as the rest of the commits, as they are ancestors of that merge commit.
same here. it is much more clear to me to reapply my commits, as long as I constrain myself to clear, coherent and atomic commits.
replaying changes is much more comfortable to me, especially when I have them in shot term memory, surely easier than merging other people stuff within your files
my average feature is around 7-10 commits, all replayed on latest commit on the branch. it forces me to catch up with other people work on shared areas and gives me quite some more confidence that merge isn't messing up with problematic files.
Disclosure up front, I don't really use git myself. I have tried it and found it to be too confusing. I liked svn and these days use hg. I also tend to work on mostly solo and small projects.
However in my observation I have found that more than any other revision control system I have used, the person ultimately responsible for the code spends far more time cleaning up history and recovering from developer mistakes on projects using git than any I can recall, and that goes back to CVS and Visual Source Safe, also including svn and hg.
I know a lot of people use git and love it so I'm prepared to accept that they're all smarter than I am. But IMHO, the version control system should be incidental to my work. It should not demand any significant fraction of my brainpower: that should be devoted to the code I'm working on. If I have to stop and THINK about the VCS every time I use it, or if it gives me some obscure "PC LOAD LETTER" type of response (which seems to happen to me when I use git) then it is a net negative. If I need to have a flowchart on my wall or keep some concept of a digraph in the front of my thinking or use a cheat sheet to work with the VCS, then it's just one more thing that gets in my way.
I think git probably has a place on very large codebases, with very distributed developers. For the typical case of a few developers who all work in the same office, I think in most cases it's overkill and people would be more productive using something simpler.
> If I have to stop and THINK about the VCS every time I use it, or if it gives me some obscure "PC LOAD LETTER" type of response
I'm sorry, there is no kind way to say this without spending too much time i don't have.
You're making the same kind of argument i am hearing from older people in my family about newer hardware (tvs, phones, etc.). You see an initial learning curve and falsely assume that this curve will never flatten out and give way to easy and intuitive access to power.
I've been using git for years and consider myself a fairly sophisticated git user, with a reasonably solid conceptual understanding of what goes on under the covers. I've even performed significant surgery on the jgit codebase (converting it to SHA256 for a security-related project - what a mess).
And yet I don't feel that the learning curve flattens out. I still end up getting wedged into states that send me scrambling for stackoverflow.
Git is incredibly powerful, which is why I use it. But the PC LOAD LETTER comment resonates strongly with me. We can embrace a tool while also acknowledging its faults.
I've spent more time learning git than I have spent learning all other VCS combined, of which there have been at least a few in my history. My mastery of git is significantly less than that of any other VCS I've used. Less powerful VCS are easier to use, and that can be a feature.
In other words: I was close to leaving my upvote and walking away, but decided to not leave you wondering, since you DID spend some effort and thought in your post.
> Well, I am older, actually. Maybe it's just part of what happens. I still miss my flip phone too. So much simpler....
It's actually what happens. I'm feeling the same way about various things as i'm getting older. I can't be arsed to figure out what Docker is, for example. Ain't nobody got time for that. Otoh, i do realize that's just me and it's probably a great thing that i hope the sysadmins i work with will know to pick up and make use of its potential.
FWIW, I've spent far more time thinking about svn than I ever spent thinking about git.
Specifically, porting changes between multiple branches in svn was a nightmare. E.g. if you have three different branches (two releases and a develop), and you need to make the same bugfix on all of them - extremely unpleasant. I ended up writing my own diff/patch management system to keep track of bug fix patches, so that I could reapply them at will on other branches.
Git instantly made sense to me. It incorporated how I already thought about repositories and diffs. The DAG structure made sense. Merging made sense; rebasing made sense; everything made sense, almost instantly.
My two cents is that i haven't found it difficult to merge between branches myself. I'll open up a diff view of the commit(s) i want to merge, and then merge them branch-to-branch and file-to-file using a two-way merge tool, using the diff as a guide.
There is difference between rewriting a "published" history from your local repo. I am heavily relying the ability to rewrite history before pushing. I hate seeing people pushing a series of commits (in a single push I mean) where the two first ones introduces a big mess and the subsequents are tentative to fixup the mess.
This. So much this. I hate looking through history and seeing crap like "lol forgot semicolon". Rebasing when you're still in your feature branch before the code hitting master to make your commits succinct, readable and above all not contain known broken code is a must.
Why are you committing code you haven't even tried to build? In the scenario you're presenting, the problem is that somebody even needed a "lol forgot semicolon" commit in the first place. Stop doing that. We all make mistakes and this will come up sometimes, but if it's happening so often that you need to rewrite your VCS history to stop from annoying other people, something is wrong.
>Why are you committing code you haven't even tried to build?
Because a DVCS tool like Git makes commits much less costly than older tools such as CVS or SVN. The dynamics (both social & personal) for commits are different.
My guess is that you understand Git commands but you're using the SVN/CVS mental model of treating commits as "sacred" markers. If someone commits in those older centralized systems, they could potentially break the build and stop the team's productivity. This leads to strange social dynamics such as programmers "hoarding" their accumulation of code changes over days/weeks and then they later end up in "merge hell".
Because Git "commits" have a private-then-public phase, the programmer does not have to be burdened with affecting others' productivity with their (sometimes spurious) commits. They can have twitchy trigger fingers with repeated "git commit". The git commits can be treated as a personal redundant backup of Ctrl+S (or vi ":w"). (Or as others stated, the git commits and private history become an extension of their text editors.) They don't have to hoard their code changes. Because of the different dynamics, they don't necessarily have an automated Continuous-Integration complete rebuild of the entire project triggered with every commit. To outsiders however, many of these commits are just "noise" and don't rise to the same semantic importance that we associated with CVS/SVN type of commits.
In this sense, "rebase rewriting private history" does not mean faking accounting numbers like "Enron fraud" and throwing auditors off track, but instead, it's more like "hit backspace key or Ctrl+Z and type the intended characters."
In CVS/SVN, the "commits" are a Really Big Deal.
However, in Git, the "commits" are Not a big deal and closer in concept to a redundant "Ctrl+S". It shifts the Really Big Deal action to the act of "applying changes or merges" (e.g. "patches" is how Linus Torvald's often describes it.)
I wouldn't go so far as to say that they're sacred, but I do think you're right that a disagreement over their relative importance is probably at the core of this.
However, I think the stuff about breaking the build is way off. If one were really fearful of any commit breaking the build, wouldn't one embrace rewriting history? You'd try to avoid making a breaking commit in the first place, but if you're fearful of breaking builds, then once you did make such a mistake, the ability to go back and rewrite it would surely look pretty good.
One of the big advantages of git as I see it is that you don't have to be fearful about bad commits. You made a commit that broke the build? Well, try not to do that, but as long as you don't push it, it's not a big deal. Fix it (in a new commit!) and you'll push both of them together. History is preserved, nobody's build actually broke, everybody's happy.
>, but if you're fearful of breaking builds, then once you did make such a mistake, the ability to go back and rewrite it would surely look pretty good.
But I was trying to emphasize that Git's "mental model" eases the burden breaking the build. If everyone buys into the concept that "git commits" are just another lightweight form of "Ctrl+S", we would expect for programmers' private branches to sometimes have broken builds. That's the nature of real-world work such as refactoring or experimental changes. There's no social penalty or stigma for broken builds in private repos. Therefore, if a programmer rewrites history to hide broken builds, it's not because of ego or image-consciousness but because of consideration for others to read a comprehensible story of the changes.
You made a commit that broke the build? Well, try not to do that, but as long as you don't push it, it's not a big deal. Fix it (in a new commit!) and you'll push both of them together. History is preserved, nobody's build actually broke, everybody's happy.
Not everybody's happy. If we conceptually treat git commits as a 2nd form of "ctrl+s", we don't want to see both commits. Instead, clean up your private history, then craft/squash/edit your commits into a logical story, then make sure your public history has a clean build, and then apply those commits to the public branch. That's the way Linus Torvalds likes it for Linux patches and many agree with him. We do want some history to be preserved but not all of it.
When you say it's another form of ^S, how often are we talking here? I reflexively ^S every couple of words, are you literally talking about committing every couple of words? Every few lines? Less? What's the purpose committing more often than logical chunks of code which can be considered in some sense "done"?
This is somewhat different from the parent's view, but personally, I try to turn the list of commits in a given PR into a readable, reviewable "story" of the general steps that need to taken to implement a feature or fix a bug. (This starts when first writing it, because splitting up changes after the fact is a nightmare.) However, I do not want to limit myself to finishing and polishing one step before proceeding to the next. For one thing, my intuition might turn out to be wrong and the overall approach I'm aiming for might not be a good idea at all, something which I might only figure out when trying to implement the final feature/fix on top of the foundations. Or it might be a good idea overall, but I might end up realizing later that, say, out of the code I was fixing up in a previous commit, a lot of it is going to be removed by a later step in the refactoring anyway, so I should probably merge those steps or otherwise shuffle up the order. For another, I will probably just end up making mistakes, some of which I'll notice myself and some of which may be noticed in code review; while the "story" is primarily for code review, it is also useful for bisecting, so even changes found in review are good to integrate into the story.
As a result, when working on the project I'm thinking of, I use git rebase -i constantly, as if each commit were a separate file and I were switching between them in my editor. However, I don't actually like that old versions of my code are being thrown away (aside from reflog); I'd prefer if Git had two axes of history, one 'logical' and one 'real' (even if that gives people who already don't like branchy histories nightmares). I hear that Mercurial has something like this called "changeset evolution", but I haven't tried it; wish someone would make a Git port.
Why not just decrease the autosave interval in your editor :)
>What's the purpose committing more often than logical chunks of code which can be considered in some sense "done"?
There are different degrees of "doneness". For example, (1) code that isn't finished but you don't want to lose it if the power goes out, (2) code that you're not sure if you're going to keep, but you'd like to be able to refer back to it even if you later decide to change it, (3) code that completely accomplishes its purpose in a logical and coherent manner.
I use "Ctrl-S" for (1), "git commit" to a local branch for (2), and "git rebase/git push" for (3). Maybe I'm just a sloppy programmer, but my workflow often involves writing some code, making certain changes, then realizing that what I really need is the previous version but with different changes. So for me, frequent commits on a local branch have replaced frequent saves under different filenames (foo.c, foo_old.c, foo_tried_it_this_way.c)
My ^S reflex is almost 30 years old. It costs nothing, and occasionally saves me, so I have no reason to fight it. Autosave is great, but every so often you'll hit a situation where it turns out that it's not firing (misconfiguration or something) and then you're doomed. Belt and suspenders is best.
As for the rest, that's interesting stuff to ponder.
Can't speak for the GP, but I often commit my changes every 3-4 minutes with messages like "Tweaked the padding." Then when my work is in a reasonable state to be viewed by someone else, I'll turn those 5-6 local commits into one coherent "feature commit" like "Redesigned the page header according to new brand guidelines."
Because people make mistakes? Not all test suites are 100% perfect so bugs go missed? That's the point of a pull request and code review. I get that mistakes happen but we don't need a record of your mistake and subsequent fix in master (unless it's already in master. Then that history is sacred). Just squash the appropriate commits (--fixup is my favoritist thing ever) before merging to master and send the right code to public history the first time.
Can't reply to mikeash below, but I also have a comment. I've burnt myself a few times where I committed something and pushed to my remote repo, only to realise that I shouldn't have.
What I've taken from my errors is that I no longer push single commits until I'm at least done with what I'm doing (I use GitFlow btw).
It's easier for such things to happen in languages where you don't need to build your project (looking at JavaScript). Sometimes it's pushing a fix, only to realise that it introduces a regression somewhere. I know that testing takes care of most of this, but not everything can have tests written for. I'm a single developer on my 'hobby' start-up, working on over 4 separate components, I unfortunately can't write tests for all of them at this stage.
Even in a language like JavaScript, you're at least running your new code before you commit, surely.
As for a fix which introduces a regression somewhere else, that seems like exactly the sort of history you'd want to capture in source control. "Fixed X." "The fix for X broke Y, which is now fixed." This is much more informative than a single "Fixed X." which only includes the final state. The fact that a fix for X could break Y is valuable information!
Yes, I run it, but if out of the possible 5'000 combinations that I go through when searching (https://rwt.to, essentially an A-B public transit planner, https://movinggauteng.co.za - data behind the planner) one of them breaks, it becomes difficult to spot errors until a few days later at times.
I could write something that does searches for as many combinations as possible, but I'm at the point where the cost of getting an error is better than spending a day where I can't work on my code because tests are running (the data changes regularly). That day's often a weekend where I've got a small window of time to work on my hobby.
On your last point, I often end up being detailed on my commits where I can fiddle with the history before pushing to remote, so I still end up capturing what happened in SC.
I'd really love a suggestion on how I could get around this, it would help me improve (I'm an accountant by profession, but do some SAS, R, Python, JS etc. as part of my ever-changing job).
I don't see the problem with making a change, breaking something that's not practical to immediately test, committing that change, noticing the breakage a few days later, and committing a fix. No need to rewrite history, just have "fixed Y which broke when I did X" later on.
In JavaScript you have to contend with a dozen different run environments. Maybe you realized your fix actually broke the feature in IE8 because you left a comma at the end of an array. It's quite common to have your fix break something in a very specific environment.
That's fine, but then I don't understand what the problem is with having that IE8 fix which you didn't make until sometime later being a separate commit.
Say I create a feature branch, this is what a day's work might look like.
839a882 Fix bad code formatting [James Kyle]
6583660 Updated plugin paths for publish env [James Kyle]
847b8f3 First stab at a mobile friendly style. [James Kyle]
a70d3f7 Added new articles, updated a couple. [James Kyle]
b743ec3 format changes on article [James Kyle]
68231e7 Some udpates, added an article [James Kyle]
2a92c5e Added plugins to publish conf. [James Kyle]
6dec1e1 Added share_post plugin support. [James Kyle]
070bbd0 Added pep8, pylint, and nose w/ xunit article [James Kyle]
eb8dbcc Corrected spelling mistake [James Kyle]
0b89761 Minor article update [James Kyle]
677f635 Added TLS Docker Remote API article [James Kyle]
d8e94fd Fixed more bad code formatting in nose [James Kyle]
f06dc2d Syntax error for code in nose. [James Kyle]
606ac2b Removed stupid refactor for testing code. [James Kyle]
This might be a very short one. If the work goes on for a couple of days, could be dozens of commits like this.
In the end, it'd be a veritable puzzle what I was trying to send upstream. Also, the merger has to troll through multiple commits and history. It's plain annoying.
So you rebase and send them something like this:
947d3e7 Implemented mobile friendly style. [James Kyle]
And if they want more, they can see the full log with a bullet list:
947d3e7 Implemented mobile friendly style.
- Added plugins x, y,
- Implemented nose tests to account for new feature
Rebasing is about taking a collection of discombobulated stream of thought work flow and condensing it into a single commit with an accurate, descriptive log entry.
Makes everyone's life easier.
edit
It's also very nice to take out frustration generated commits like "fuck fuck fuck fuck fuck!!!" before committing upstream to your company's public repository. ;)
I agree with mkeash's comment. Isn't this also the whole point of the staging area? "Stage multiple fixes into a single coherent commit," is the underlying model. Instead, in the above example there are many granular commits, with rebasing used to then clump them into a logical grouping.
This would indicate to me that you are committing too often or not using the staging area properly.
Committing often allows you to remember the small changes you made throughout the feature. If you let a file sit in the staging area for hours, days, weeks, you will most likely have a hard time remembering why you made all the changes.
Is there a way to do this with the staging area? o.O
Merge commits, particularly those that merged master multiple times effectively destroy history (by preserving it). For that matter, many projects maintain all commits to master should work! Unless you advocate only committing entirely working states (unlikely for large features), you'd have to rebase.
Can you explain what you mean by effectively destroying history by preserving it? That doesn't make any sense to me. And I also don't understand the link between merge commits and a failure to ensure that all commits to master should work. If you make changes in a branch, get everything up and running there, then merge to master, does that not ensure that everything on master works?
I agree with getting rid of noise and adding signal (see my sibling post) but banging a whole feature into a single commit is going way too far IMHO. "Added share_post plugin support" sounds like something that should be in a permanent commit to me. "format changes on article" probably not so much (assuming you created that article in the same feature branch).
I share an svn codebase with a few dozen developers, and while we don't have rebase, the history remains readable. There's a few guidelines that enable this: (1) all work must happen in the context of a jira issue or story, (2) the commit message starts with the jira issue id and its description, and only after that any stream of consciousness remarks, and (3) syntax errors will cause the commit to bounce and failing tests from the CI build will get you frowned upon. The history will usually reveal a few commits for a feature, spread half a day or a day apart. We rely on the local history feature of phpstorm to be able to backtrack to an earlier version (that and good old copy-pasting of the working copy before you start an experiment)
Is there a handy macro/script type thing that simplifies squashing a release branch and using the commit messages as bullet points (with the ability to edit out crap)?
I tend to agree. One exception I think is rebase on a feature branch. If you rebase a feature branch onto master before merging it into master, I think you can get a cleaner history while achieving the linear history the OP wants -- and in this isolated case, I think you aren't losing any useful context by making it seem the feature commits were all done right before merge into master.
Maybe. I'm not actually sure, to be honest what's a good idea with git history, this included. Feedback welcome.
People who love rebasing and linear history tend to see feature branches, even if pushed to a public repository, as private to their creator and maintainer and fair game for any sort of rebase. In fact, we do consider rebasing of feature branches mandatory.
The only thing is, while it is easy from the downstream side, it's a little more tricky to prepare the new branch.
One thing you can do is actually do the regular rewriting rebase, install the result under the new name, and then throw the rewrite away.
Rebase our-topic.0 to its default upstream, thereby locally rewriting it:
$ git rebase
(Precondition: no local commits in our-topic.0: it is synchronized with origin/our-topic.0, so they point to the same commit.)
Now, assign the resulting commit to the newly defined variable our-topic.1:
$ git branch -t our-topic.1
Now, throw away the local modification to our-topic.0. We don't have to appeal to the reflog, because our-topic.0 is following a remote branch which we can reference:
$ git reset --hard origin/our-topic.0
(Remember the precondition before all this: origin/our-topic.0 and our-topic.0 pointed to the same commit. Now they do again!)
Seems like you could simplify this quite a bit by just creating our-topic.1 before rebasing. Given our-topic.0 == origin/our-topic.0, and our are currently at our-topic.0
The counter argument though is when your feature branch doesn't only have _one_ creator/maintainer. Mine often don't, especially on open source projects, two or three people can be working collaboratively, or others that aren't the lead on the feature can come in to make a helpful commit here or there.
And when one person rebases the feature branch it wreaks havoc for collaborators on the feature branch.
Which is why I limit my "rebasing is okay" on a feature branch to only _right before_ it's merged into master and then deleted. It still doesn't get rid of all the problems, but it gets rid of most of them.
If you have a handful of people, you simply communicate with them, check that there's a good reason to rebase and that you're not creating unnecessary burden and do it when everyone is happy.
When you have more than a handful of people, then your feature branch is not a feature branch, but a project, which should have feature branches of its own.
Scale, dynamic adaption to it and situational awareness are a requirement in team work. :)
Bingo! Recently I've been working on resolving a bug with a small group of coworkers. We created a repo in which we have been rewriting public branches all the time. You just send an e-mail. Everyone just has to know how to migrate their local changes.
$ git fetch # the rewritten world
$ git checkout
Your branch and 'origin/foobar' have diverged,
and have 13 and 17 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
Now I happen to know that only 3 out of the 13 divergent commits on foobar are my local commits. I rebase my local foobar branch to the upstream one, migrating just those 3, and ditching the remaining ten:
$ git rebase HEAD~3 origin/foobar
Easy.
This is all just test code people are trying in investigating the bug. Any permanent fixes arising are properly cleaned up, commented, and submitted via Gerrit to a whole other repo where they are cleanly cherry picked to a linear trunk which is never rewritten.
> If you have a handful of people, you simply communicate with them, check that there's a good reason to rebase and that you're not creating unnecessary burden and do it when everyone is happy.
Why is that communications overhead worth it, vs simply not rebasing? (or at least not until right before you merge into master and delete the release branch)
I think the communications overhead can be significant, especially if the handful (even just 2 or 3) collaborators are at different locations, organizations, timezones, etc.
I'd rather just not have to think about it, not to have to deal with that communications overhead, and not rebase. What do you get from interim rebasing, anyway, especially if you are still wiling to do a final rebase before merge?
Then hunting bugs becomes very difficult, because you can't simply see behavior change from one commit, it might change in any merge because the behavior of one commit in one branch disagrees with the behavior of another commit in a different branch. Even worse, the behavior might be created from a badly-done conflict resolution in the merge commit, which is REALLY hard to see.
I envy you if you have not yet experienced this pain, but i assure you it is a real problem; and that you're merely exchanging effort now with effort later.
If it is pushed other people can cherry-pick, CI generates results and other people might push commits on the same branch (when using the same repo). We think pushing indicates you want to work in public. We even made a Work In Progress (WIP) function for it in GitLab http://doc.gitlab.com/ce/workflow/wip_merge_requests.html
It's a social question. To be honest, for the vast majority of people i work with (and i've worked with a few: https://github.com/wchristian?tab=repositories ) it would be considered very strange to see feature branches as public and stone-written history. There's a common understanding that feature branches are feature branches because the creator wishes to avoid writing things in stone.
That said, if a team decides on that kind of convention, marking things as WIP is a neat feature. I've seen other people do that simply by creating feature branches as "<username>/feature_branch".
That seems overly broad. It seems to me that most people who use git agree that public history shouldn't be rewritten, especially on master.
> The whole point of history is to have a record of what happened.
On the other hand, a bunch of "Derp" or "Whoops" type commits aren't very useful. It's definitely beneficial to clean that sort of stuff up by rewriting local history before pushing.
I'm talking about both public and private history.
It's far more beneficial to just not make commits like "Derp" or "Whoops" in the first place. Think about your commits and your commit messages as you make them. No, you won't get it right all the time. And that's OK; nobody is perfect, and your history can reflect that you're not perfect. But if you're editing your commit history to fix idiotic commit messages, you're doing it all wrong.
One of the things I like about git is that I can make bad commits and fix them later. If I'm working on one feature and I'm interrupted by another task, I can commit "wip - blah", then check out a different branch and work on that. When I go back, I pick up exactly where I left off, and amend the half-finished commit into something that actually makes sense before pushing it out to the rest of the team.
In the past, I never made those sorts of commits, because I used VCSs in which you couldn't. Instead, I avoided committing by checking out a separate workspace for the new work. That's a lot slower, though, and it's easier to lose uncommitted changes. Committing incomplete, broken work allows you to leverage your VCS to manage even your unfinished code.
The "Derp" commits are the ones made immediately after a mistake. No one is editing their commit history just to fix these idiotic messages, they're doing it to fix the idiotic mistake that precedes the idiotic message. Yes, no one is perfect, but leaving the previous (wrong) commit alone has real consequences: it prevents people from easily cherry-picking the commit, using bisect to debug later, doing an accurate code review, etc.
edit: I realize that there are probably a few exceptions to my "no one" IRL. But I don't think anyone here would defend that practice.
Well then, try to structure things so you're catching those idiotic mistakes before you commit them.
The consequences you describe seem extremely mild. Cherry-picking now requires two cherry-picks instead of one, big deal. Git bisect has a "skip" command that solves this nicely. And I don't see how code review is at all impacted by this, unless you're using terrible tools that won't show a full summary of changes all at once.
Or... rewrite history. :) All of these problems are trivial when the commits are right next to each other; they're less trivial when they're separated by other unrelated commits.
"Don't make idiotic mistakes" isn't really advice that anyone can follow.
I think it depends on what kind of idiotic mistakes we're talking about. Stuff like forgetting a semicolon is completely avoidable with a disciplined approach of, "Always build before you commit." Other kinds aren't so nicely avoidable, but then I think the record should show them anyway.
Fair enough. I agree that rewriting history shouldn't replace other good practices. But I don't really see the benefit of having an exact historical record of all the mistakes made when coding. What does it get you?
It's not so much what the true historical record gets you, but what you potentially lose with the fake one. Do the commits in your edited history actually work? If you go back to see why a given change was made, will you get an accurate picture of the state of the rest of the code at that time?
But one of the points of editing the history is to make sure that both of those are more likely to be yes -- it's to make the history easier to review historically than it would be unedited. (IMO, obviously).
And this is probably partly why there's what you originally called an "obsession" with rewriting history: retroactively rewriting something 5 months old is probably going to be a disaster. But rewriting 1 day's worth of commits to better express why a given change was made, to give a more accurate picture of the state of the rest of the code, and to make things in general easier for people to read in the future is pretty trivial. So why not do it?
Are you building and testing your edited commits as you make them? If so, that seems fair, but a lot of work. If not, I don't see how it increases the chances of good results.
All of the extra building and testing can be automated, so the extra work just becomes a matter of reorganizing the work to make logically cohesive commits and it's more work in the same sense that writing good comments is more work. Whether building and testing each commit is done as often as it should be....
I would bet that many people eyeball it, build and test the end result, and claim that that's good enough. Since many of these people probably edited out a commit with a message like "lol typo broke the build" that might be an overly optimistic attitude ;)
In any case, I don't see how it decreases the chances of good results. You already dismissed my suggestion that it's nice to have each commit build and pass tests, so it's a bit strange to start worrying about it now.
to be fair, if it's truly a "derp" commit you're better off using "git commit --amend --no-edit", which is technically rewriting history but in one stroke :)
I usually use rewriting history to package the code I've worked on into logical commits that should be stable at each point when applied in order. That way, it's very easy to reverse a commit or cherry pick specific functionality into the production branch early (ie the main branch isn't the stable one).
Would I like to get away from that and do it from the get-go? Oh yes, it'd be great. But I'm not there yet and so re-writing history is nice. And doing so forces me to think about the code I've written and where the boundaries of the changes I've made are. Granted, I haven't done it on very long lived feature branches (or big ones) - that may be where most of the penalties are manifest.
> It always feels to me like people just being image-conscious.
Every author is "image-conscious" because they want to present their thoughts clearly to the world. That's where your rather substantial misconceptions about the application and utility of rebasing come from. This isn't about rewriting published history, which is rightly and nearly universally considered A Bad Idea(tm) in the git world. The recommendations around rebasing are essentially identical to authors editing their text before publication. Note "before". Before {an article, some code} is published, edit, rewrite, cleanup all you want. After it's published, an explicit annotation is the best practice. For an author, perhaps an "Updated" note in an article or a printing number in a book. For a developer, add a new commit recording the change.
For my part, I use rebasing extensively and lightly before I publish code. By "extensively" I mean, I just don't hesitate to edit for clarity. This is the same as I'd do in authoring a post or email. By "lightly", I mean that I don't waste time doing radical history surgery but I regularly do things like squash a commit into an earlier logical parent commit. E.g. I started a refactor, then a little while later found some more instances of the same change. Often, this is just amending the HEAD commit, but occasionally I need to go back a short ways on my working branch.
This also fluidly extends to use of git's index and the stash for separating out logical commits from what's in the working copy. A typical example:
1. git add <files for a logical change>
2. git stash -k # put everything not added into the stash
3. # run tests
4. git commit
5. git stash pop
Once you're used to the above workflow, an understanding of git's commit amending and rebasing tools extends this authoring capability into recent history. This is wonderful because it takes pressure off of committing, meaning that git history becomes a powerful first-class, editable history/undo stack.
Remember Git was born in Linux. And in Linux, a commit is a political statement. Your need to be succinct (your commit must stand along with 2000 commits per day) and emphasize the "obvious" brilliance of what you're doing to overcome noise overrides the need for recording all the thought processes along the way.
In most organizations, we don't have anywhere near that number of participants and we don't want charismatic developers, we want something that works right now and we're confident that changing it is not merely a possible outcome but very very likely.
I totally agree with you, I don't get it either. My only explanation is that as a programmer you are trained to write clean and understandable code. I also try to apply this to my commit messages (with varying results). But rewriting history to make everything look clean and simple is the wrong this to do. The messier your history is the more likely you'll need to retrace your steps (and CI results) at some point. It is mostly people coming from SVN and only running CI on the trunk branch that favor the rewriting approach. It might be hard to let go.
"But rewriting history to make everything look clean and simple is the wrong this to do."
Absolute statements are always wrong.
There plenty of great reasons to re-write your local history, many of which have been explored by other comments on this thread. Moreover, Linus disagrees with you -- the rebase flow is the one used by the author of git.
My guideline is that commits that don't have semantic meaning for your team should be avoided. It's therefore perfectly okay (desirable, even) to drop a "wip" commit in your local repository, but that commit is semantically meaningless, and shouldn't make it into shared history. Rebase!
Merge commits, likewise, should be avoided unless they carry semantic meaning. It's semantically meaningless to note that you've merged master back into your working branch N times since the initial branch point -- that's a development detail that is irrelevant to your peers. It's semantically meaningful to note that you've merged a feature branch into master. Your peers care about that.
You couldn't re-write history at all with SVN, so it's kind of goofy to suggest that this is legacy behavior. If anything, SVN had a problem that every pointless brain-fart commit by anyone, ever had to be preserved in the history. This made the history useless.
> It is mostly people coming from SVN and only running CI on the trunk branch that favor the rewriting approach. It might be hard to let go.
You're being ridiculously prejudiced and jumping to conclusions AND throwing out judgements on things that by your own admission ("I don't get it either.") you do not understand.
Please realize that the correct response in such a case is not to double down, but to engage in a dialogue so you may reach understanding of which factors you're unaware of or they're unaware of, that create the difference in stance. (And no, you can't expect the other side to initiate the dialogue. The way you are talking you present an image of someone singularly uninterested in dialogue, even if that may be unintentional.)
Thanks for the comment. I agree I sound prejudiced and I'm sorry for that. I'm open to dialoge about this but I agree that my tone isn't helping. Anyway, I love thinking about this topic and I'm open to new insights.
> The obsession of git users with rewriting history has always puzzled me.
Editing draft commits is fine. Editing public commits is less fine. The problem is that git has no way to distinguish draft and public commits except by social convention.
Mercurial Evolve actually enforces the separation between draft and public commits, and can also allow setting up servers where people can collaboratively edit draft commits.
Everything about git is about managing a useful history. Otherwise it would be a history of every keystroke or at least every file write. Instead you write some code until you feel you have enough to make a useful commit (you will have to come up with your own idea of what represents a useful commit), commit all those changes together as a single commit (thereby losing history), and come up with a useful description of all those changes. Managing an already created commit is just a further extension of this idea. You can use what you learned from from your experience of coding, testing, and committing to change your commit history to be even more useful. Of course things can go wrong if you are changing the history of a branch that others have cloned or branched off of.
I can make 20 or 30 commits during some code changes in a morning's worth of coding. This allows me to easily trace back to any point, or cross-reference changes across many local branches, etc.
At the end, it might all be squashed down into a single bug-fix commit for the devel branch.
The commit granularity that's desirable and effective for an individual is very different to the history you want in the main feature branches.
> The obsession of git users with rewriting history has always puzzled me. I like that the feature exists, because it is very occasionally useful, but it's one of those things you should almost never use.
I disagree, and it's actually impossible not to use it. Rebase rewrites history. If you have a long-running feature branch you need to merge back into master, you have to rebase it against the current master. There's really no other choice.
> The whole point of history is to have a record of what happened.
Define "what happened" in this context...are we talking about what the feature's changes end up looking like, or the entire linear history of the work on this feature starting from the point at which the programmer experimented with a bunch of dead-ends before finding the right path?
Personally, I feel like an extremely detailed history of my personal problem-solving adventure on every complex ticket is irrelevant. At the end of the day, the code reviewer just wants to know what changed. When I review code, I prefer to look at a massive diff of everything that's been done, not read commit-by-commit. I'd rather see exactly what I'm going to pull in when I merge it into master.
I would also disagree with you here that the whole point of source control is to maintain a history of what happened, and argue that the point of source control is communicating changes between developers on a team. The fact that it backs up your code and keeps a history of what changed are merely secondary features to the central value of providing a way of communicating changes to a codebase between developers. I think Git is the best version control system for doing this, because it allows you to rewrite history. That said, rewriting history is very dangerous and if you use it incorrectly (like never ever rewriting history on a branch other people have to pull from), you're
> If you're going around and changing it, then you no longer have a record of what happened, but a record of what you kind of wish had actually happened.
If you're using Git, this is a complete falsehood if you are the person who made the commits. The reflog provides a reference to every single change made to your repository, so you can just reset back to the point before you rebased and voila, like magic everything is back to the way it was. This isn't a "hack", that's what reflog is for. It's a giant undo list for your local clone of the repo.
So in essence, history is never destroyed. It's just hidden from view. You can always go back in Git unless you actually `rm -rf .git/`.
> Some programmers really want to come across as careful, conscientious, thoughtful programmers, but can't actually accomplish it, so instead they do the usual mess, try to clean it up, then go back and make it look like the code was always clean.
You might be correct in some cases, but I think for the majority of the time you are confusing explicitness with vanity. Programmers want other people on their team to know what they did, or at least the intention of their code, and having commit messages that "tell a story" and make sense are vital for doing that.
If you have a long-running feature branch you need to merge back into master, you have to rebase it against the current master. There's really no other choice.
Yes you do. Merge master into your branch. Rebasing long-running branches is a nightmare, because every diff you replay will probably result in a conflict, and if you have hundreds of commits, you could be there for several days rebasing. Merges, even massive merges, generally don't take more than a few hours. All project size dependent of course, but the ratio of work is about right: 5-10x more work for a rebase over a commit.
Whilst this is occasionally useful, it's best avoided in my opinion as it's incredibly difficult to review a merge commit (especially a large merge commit).
(Most of the time I'd advocate, if you have to do larger project branches, either merging work in piecemeal asap e.g. hidden behind a flag or whatever, or keeping work in new files so that merge conflicts are kept to a minimum.)
I don't understand this bit about having no other choice but to rebase a branch against master. When I have a long-running branch that no longer cleanly merges into master, I merge master into the branch first. I've never seen a case where rebase is required.
I find your conclusion to be most confusing. You say that programmers want other people on their team to know what they did, and then you say that they accomplish this by constructing a fake story about stuff that never happened. Sounds like you mean that programmers want other people on their team to know what they wish they had done. Which is understandable, but not at all the same.
I'm starting to think that part of the problem we are facing at the moment is that feature branching itself can be exceptionally harmful.
I see statements like "The power of git is the ability to work in parallel without getting in each-others way" and get really worried about what people are trying to achieve. I want my team's code to be continuously integrated so that problems are identified early, not at some arbitrary point down the line when two features are finished but conflict with each other when both are merged. We seem to be reversing all the good work the continuous integration movement gave us; constant integration makes integration issues smaller and easier to fix.
I personally prefer to use toggling and techniques like branch by abstraction to enable a single mainline approach. Martin Fowler has a very good article on it here http://martinfowler.com/bliki/FeatureBranch.html
Agree, avoid large merges at all costs. the longer a feature branch exists the more needless cost it incurs when trying to re-integrate it with the merge.
Absolutely. Though you can mitigate that by merging develop into the feature branch often. It makes the merge back pretty painless.
Of course if you've got two long-living, contradictory feature branches, merging develop is not going to be enough. I guess you still need some communication in the team. But it's also important to keep your branches as short as you can. If you got a really big feature, try splitting it up.
If you're using CI testing results and tying them to particular commits, you end up with the same problem whether you merge or rebase.
If you test a commit and it passes, and then merge that commit into master, the merge may have changed the code that the commit modified, or something that the commit's code depended on. The green flag you had on the commit is no longer valid because the commit is in a new context now and may not pass.
If you rebase the commit onto master, you're explicitly changing the context of the commit. Yes, you get a different SHA and you're not linked to the original CI result anymore, but that CI result wasn't valid anymore anyway. This is exactly the same situation that the merge put you into, but without the false assurance of the green flag on the original test result.
As many others have noted, rebasing is only recommended on private branches to prepare them for sharing on a public branch. If you're running CI it's probably only on the public branches, so rebasing wouldn't affect that. But if you're running CI on your private branch too, then you're going to want to run it after rebasing onto the public branch and before merging into the public branch. That gives you assurance that your code works on the public branch before you share it. Again, if you're using a merge-based workflow you'd have to do the same testing regardless of your earlier test results.
My intuition of how CI should work is that it tests what the new master would look like were that commit [rebased|merged] into the current master, for that exact reason. Is that not how CI systems tend to work? What you've described above does seem awfully silly.
This is exactly how I setup our teams workflow. Private commits are pushed into gerrit run against the CI suite.
1. If reviews + CI tests go well we fast forward merge onto master.
2. If the commit's parent isn't the latest commit on master, it is automatically rebased and the CI suite is kicked off again.
3. Upon successful fast forward merge into master, all in-flight reviews are automatically rebased on master's new head and CI's kicked off again.
4. Any open commit can become the top of master without worry it will break the build.
For our team of ~10 this works exceptionally well with master not being a broken due to our code in the last ~6 months. (edit: formatting)
There is nothing wrong with rebasing a feature branch imho. Feature branches should be considered ephemeral. But it probably depends on your team and project size.
My personal opinion is that it breaks history and CI tests for all the feature branches. But at GitLab we encountered customers that insisted on having a linear history after migrating from SVN. Therefore there is a function in the UI of GitLab EE to rebase a merge request when accepting the merge request. See http://doc.gitlab.com/ee/workflow/rebase_before_merge.html
The results from previous CI runs are no longer connected to the rebased commits, and most people don't have CI set up to retest each commit individually after a rebase.
I've never found this to be an actual problem in practice.
Indeed you need to do them again. But you also have to rerun them after a merge anyway. The problem is that you can no longer see which commits that you merged where green before the merge. For example is very useful if the merge itself breaks the tests (uncommon but it can happen).
Instead of taking the branch <feature> and rebasing it, make a branch called <feature>.1 on top of <feature>, then rebase that, leaving <feature> in place.
That way all your references remain intact and you can compare CI results.
It's all trade-offs. The trade-off being that you maintain history by having copies of a branch around, while the dude trying to fix a bug doesn't have to break out the vodka bottle upon arriving back home. :)
It is better to have a messy but realistic history if you want to trace back what happend and who did and tested what at what time. I prefer my code to be clean and my history to be correct.
^^^ Couldn't agree more.
However, I don't know why people want to avoid rebasing feature branches. Rebasing feature branches means that you only have to resolve the conflict once and have a clean history for your release branch. Granted, it works well in my team where only a single developer owns a given feature branch.
If you have a feature branch with a number of changes in the same place, rebasing on to a branch that also changes in the same spot means you need to fix all of the related commits during the rebase. If it's a big feature that could end up being a huge task.
True. You do have to fix all the true conflicts. But you have to do it only once. If you merge your master, you have to resolve conflicts every time you merge from the master. git rerere could be potentially useful here.
Even more importantly rebase allows you to resolve conflict at individual commit level. During conflict, you can see the exact commit that is causing it, making it much easier to resolve it.
Sytse, do you mind if I ask a slightly OT question?
How does GitLab store the code-review data? Is it stored in the (or a) git repo? Is the feature compatible with rebasing feature branches before merge?
Also, pricing: I only just noticed that your pricing was per YEAR, not per MONTH. Most boostrap-pricing-page software is priced monthly and the user/year text is lowlighted. This has to be costing you sales.
When you accept a merge request the title, description and a link to the merge request are stored as the commit message. For example see https://gitlab.com/gitlab-org/gitlab-ce/commit/6c0db42951d65... This allows you to see any other things that were discussed. Hopefully any line comments were resolved with a commit (thus documenting them) or were based on a misunderstanding.
Looking at the recent history i can see how you'd come to like it. You seem to mostly be doing merges or documentation changes, which probably means you don't have to do a lot of history spelunking to fix bugs caused months ago.
Are you sure your developers feel the same as you do? Are you sure they're willing to be open enough to you about their misgivings?
I think the majority of people on our team don't like rebasing because it makes spelunking harder in some cases. But there are certainly people that preferred having some commits rebased so it was easier to revert them (reverting a merge is possible but harder). Although I'm not an active developer myself I think my dislike of rebasing everything is shared.
Another social issue. In the projects i work with, it is commonly agreed upon and known that of course after a rebase every commit needs to be retested and reverified with CI. It is a bit of extra work, but as i mentioned in another commit: We do that extra bit of work now, to avoid having to analyze the cross-talk of a bunch of branch merges at a later point; mainly because that bit of later work often turns out to be considerably bigger. (Technical debt is a talking point here.)
Of course, this might be unviable when under unusual time pressures.
At our offices we use Pull Requests and Rebase as a combined work flow to get mostly linear history. Before issuing a PR we get the latest master, rebase our branch onto that so that all commits in our branch are approximately at the same time stamp and then issue a PR. This creates a nicely linear history for the most part.
The only evil is a willingness to force push the updated history over our branches before the PR goes up. But no one shares branches usually. Or the collaborators on a branch are few and they agree when to rewrite history.
> * In reality the history was never linear to begin with. It is better to have a messy but realistic history if you want to trace back what happend and who did and tested what at what time. I prefer my code to be clean and my history to be correct.*
Basically my beef is the idea that never rebasing is "true" history, and rebasing gives you "corrupt" or "whitewashed" history. In fact, the only thing you have weeks, months, years after pushing code is a logical history. It's not as if git automatically records every keystroke or every thought in someone's head—that would be an overwhelming amount of information and difficult to utilize anyway—instead it's all based on human decisions of what to commit and when. Rebasing doesn't "destroy" history, it's just a decision of where a commit goes that is distinct from the time it was first written, but in fact you lose almost no information—the original date is still there, and from that you can infer more or less where it originally branched from.
"But," you say, "surely complete retention of history is preferable to almost-complete retention?". Well, sure, all else being equal I would agree with that. But here's the crux of the issue: merging also loses information. What happens when you merge two conflicting commits is that bugs are swallowed up in merge commits rather than being pinned down to one changeset. This is true whether it is a literal merge conflict that git detects, or a silent logic error that no one discovers until the program blows up. With two branches merging that are logically incompatible, whose responsibility is it to fix their branch? Well, whoever merges last of course, and where does that fix happen under a never-rebase policy? In that single monstrous merge commit that can not be reasoned about or bisected.
But if you always rebase before merging to master, then the second integrator has to go back and correct each commit in the context of the new branch. In essence, they have to fix their work for the current state of the world, as if they had written it today. In this way each tweak is made where it is visible and bisectable instead of squashed into an intractable merge commit.
I get that there is some inconvenience around rebasing coordination and tooling integration (although GitHub PRs handle it pretty well), but the idea that the unadulterated original history has significant value is a straw man. If the branch as written was incompatible at the point it got merged, there is no value in retaining the history of that branch in an incompatible state because you won't be able to use it anyway. In extreme cases you might decide the entire branch is useless and just pitch it away entirely, and certainly no one is arguing to save history that doesn't make it onto master right?
> ...the history of a project managed using GitFlow for some time invariably starts to resemble a giant ball of spaghetti. Try to find out how the project progressed from something like this...
It's simple. Read backwards down the `develop` branch and read off the `feature/whatever` branches. Just because the graph isn't "pretty" doesn't mean it's useless.
In general, I'm starting to dislike "XXX considered harmful" articles. It seems to me like you can spout any opinion under a title of that format and generate lingering doubt, even if the article itself doesn't hold water. Not to generalize, of course--not all "XXX considered harmful" articles are harmful. They generally make at least some good points. I just think the title format feels kind of clickbaity at this point.
That said, kudos to the author for suggesting an alternative rather than just enumerating the shortcomings of GitFlow.
Articles like this are harmful. There are many of us successfully using gitflow without _any_ of the issues raised in the article.
And yet people who want to argue against the use of it simply because they don't want to learn something new now have a useful link to throw around as "proof" that a very successful strategy is "harmful". I guarantee in the next year I will have to go point by point and refute this damn article to some stubborn team lead or another senior dev.
Nobody should ever write an article outright bashing a strategy that they either don't fully understand or personally have not managed to integrate successfully in their own day-to-day. Bare minimum, if you're going to publish an article critical of a tool, don't name it so aggressively as to sound like it's fact rather than a single personal point of view.
Also this is a problem with Git not GitFlow. In Mercurial, for example, every commit is attached to a branch so it's very easy to follow where the changes happened.
HgFlow already exists and is actively developed: https://bitbucket.org/yujiewu/hgflow/wiki/Home. It even goes above and beyond GitFlow and adds generalized streams and sub-streams. I've never had to use those extra features, but the core model works well.
Mercurial branches are not meant to be used for short-lived feature branches, though. As "hg branch" puts it: "branches are permanent and global, did you want a bookmark?".
I'm not familiar with the gui git tools, preferring instead to work on the command line, but does whatever tool the author used to generate the graphs used on the web page support command line flags?
Specifically, git log --first-parent (--oneline --decorate) would look much better with the documented strategy. Instead of seeing all the commits in the branch, all that's shown is the merge commit. If you used the article's branch names, all you'd see is:
* Merge branch 'feature/SPA-138' into develop
* Merge branch 'feature/SPA-156' into develop
* Merge branch 'feature/SPA-136' into develop
If you actually used descriptive branch names, that would actually seem to be quite useful - see immediately see the features being added without seeing all the gritty details!
The approach discussed on the article seems to take into account only one possibility: you deploy master in prod, and it's always considered correct.
That works for small projects, but in my experience, when you have a bunch of people (let's say 20) pushing code to a repo, you need several levels of "correctness"
- branches: Work in progress.
- develop: Code ready to share with others. It can break the build (merge conflicts, etc) and it won't be the end of the world.
- master: This shouldn't be broken. It needs to point to a commit that has already been proven not break the build/pass all the required tests.
As always, you need to find a balance with these things and adapt to the peculiarities of your code base and team. I really see them as suggestions...
Yeah, we also use something like this for building a website/webapp (for a client) with 5-10 people.
- Feature branch: do whatever you want
- Develop: should be good enough for the client (product owner) to look at
- Release branch: should be good enough to be tested by the test/QA team
- Master: should be good enough for website visitors
Branches are meant to be shortlived and merged (and code reviewed) into develop as soon as possible. We use feature toggles to turn off functionalities that end up in develop but can not go to production.
The problem with having too many eternal branches is that they quickly become unmergeable. The nice thing about feature branches is that it's the author's responsibility to make it mergeable. But if you having a bunch of eternal branches none of which are "owned" by one person, when it comes time to merge them and there's dozens of merge conflicts there's not one person that can set down and know what the correct fix is for all of them.
I think the idea is that the branches cascade. You would never create new commits directly into release or master, the flow would only ever be develop > release > master, thus making merge commits and conflicts impossible.
That's exactly what hotfixes are for. A completed hotfix will merge directly into develop and master simultaneously (practically speaking). This allows you to keep your unrelated develop commits out of the master until you're ready to merge it all.
I'm a bit confused, why are you merging things into develop if they're not ready?
If Feature B isn't ready, it should stay in its own branch until it is.
Develop is for code that the developers say is ready. You might have bugs, poor merge resolution, etc, but any fixes made should be quick and should pave the path toward code that can be merged into master. If the problems are major, revert develop.
Master, on the other hand, should always be stable and rock solid. You can then have production servers that always pull master automatically, and staging servers that always pull develop automatically.
I've worked on multiple teams this way it works out quite well.
In the above example, say Feature A gets the go-ahead from the business user/product owner/whatever, and Feature B doesn't. From the dev standpoint both are complete but there may be some business reason to hold up Feature B. The owner(s) of Feature A are likely not going to understand why anything to do with Feature B has to hold up their release.
In your scenario, Feature B should not have been promoted into the development branch until everyone agrees it's going to be shipped. If it got into development without everyone's approval, then that is a business process problem, not a CVS tool problem.
At my last job, we had a 'gorilla' that had to approve every commit, and it was his job to coordinate between the project managers and the developers as to what was allow to be promoted to prevent exactly the scenario you described. It also had the benefit of making developers describe each commit clearly so that the gorilla could understand the change, which made looking at the commit history some time later (months/years) quite a bit easier too.
In that scenario is everything in develop simply waiting to be pushed to master, with no additional testing/approval? At that point why not push directly from feature branches into master?
Because features are not always isolated, sometimes they change behaviour based on changes in other branches. Testing features in isolation is different from testing an entire release.
We follow a similar process where I work and it works really well.
Something being merged into development means that it is 100% complete and ready to be deployed to production at any time. This means all QA and acceptance testing had to happen before merging.
What you're describing doesn't make very much sense. There are only two cases here:
1) A feature that's in develop may sometimes need to wait for something (QA, business validation, a go-ahead, whatever) to go to master, or
2) A feature that's in develop can ALWAYS go to master at any time.
In the first case, you can have feature X blocking all other features because it's in the same branch as them, which is the problem the GP is describing. In the second case, the two branches are the same, so why not just merge into master directly?
It's always #2, but just because a feature has been approved for master, doesn't mean all the features approved for master are stable together.
A feature might be stable as a standalone, but in virtually every software suite, features interact with each other. That is what develop is for. It's where all the feature branches meet and where any instability created by everything coming together is resolved.
Master, on the other hand, is the product of that successful resolution after all the bug fixes.
I've found that that generally leads to problems, so I prefer to have the feature that is going to be merged last be responsible for fitting in with the rest. That way, as described earlier, I don't have to wait for features that can't be merged now to merge the ones that can.
How do you know which feature will be merged last? In the teams I've worked in previously, everything happened in parallel. You had no way of knowing what feature was going to be merged when. So, essentially, we completed a feature-branch, tested it, and merged it into develop. Then, at set intervals, we did some heavy testing on develop and fixed any outstanding bugs, before creating a release by merging to master.
Does having a lower level of quality on develop hinder your ability to actually release code?
We have one particular repo at work that is just a pain in the ass to work with (Typescript analytics code that has to be backwards compatible to about forever), and we've pretty much abandoned the develop branch since releases got held up due to bugs in less important features that had been merged in without comprehensive testing. Pretty much everything now gets tested extensively on a feature branch and then gets merged directly into a release branch.
We might have swung a little too far in the other direction, I'm thinking we want to at least merge well tested bits back onto develop, but at least we can release the features that are actually done and cut those that are still having issues without having to do any major git surgery.
This is how my team works. I always thought GitFlow was way too complex for our use case. The way you describe seems to work for us, mostly. There are still the occasional mistakes and commits to the wrong branch, often accidentally bypassing develop, which I guess supports the article's premise.
I don't agree with master being the production code. It's the default branch when you clone. I like to keep the production branch on a more explicit branch so my team knows they're dealing with release code.
I do something very similar; use master for dev builds that can be shared (since it's the default branch) and something like 'deploy' or 'deployprod' (which on second thought is a bit redundant) for release. Then, using a CI tool, you can have master go to a staging environment and deployprod go to production.
We use a master / development branch strategy. For us, the main problem with having one branch and tagging releases is that it's relatively common for work to continue on the development branch while the release branch is still being tested. For us, there's a 2 day period where we're testing out the upcoming release. Developers will be fixing bugs on this release branch, and also commiting new code for the subsequent release. If we had one branch with tags, developers would need to keep all their new code on feature branches until a release is completed, and the likelihood of accidentally releasing an untested feature gets a lot higher (one advantage of having two branches is that you can treat your production branch with a lot more care)
I have never understood why people hate merge commits so much. Their advantages are not insignificant: you know when a feature got merged in master, its much easier to revert a feature if you have a merge commit for it, much easier to generate a change log with merge commits, and you have none of the problems that pushing "cleaned up" histories will have: https://www.mail-archive.com/dri-devel@lists.sourceforge.net...
The main disadvantage, as the article rightly points out, is that it makes it much harder to read the history. But that's easily solved with a simple switch: --no-merges. It works with git-log, gitk, tig, and probably others too. Use --no-merges, and get a nice looking linear history without those pesky merge commits.
The problem, at least for me, is not the merge commit. They are indeed easily ignored. The problem is, that I don't want to see a dozen commits fixing typos or trivial bugs.
One trick that can really help with this is `git commit --amend`, which allows you to amend the last commit. If you encounter a bug or a typo in the your last commit, add your fix to the index and then do `git commit --amend`. This will replace your last commit with a new one that contains your latest fix. Of course, this should only be done if you did not push your last commit to remote.
For fixes to earlier commits, I don't bother much, and just live with the trivial commit. Though if I end up making several trivial commits in one setting, I do a cleanup and merge this fixes in one commit before pushing.
slightly OT, but I wonder if anyone has an answer for this.
We use feature branches and rebase before merging to master (mostly for the reason stated above - keep things clean and treating the branch as a single logical unit, not caring about higher-resolution history within the branch).
However, some times, especially since we typically merge on Github from a PR, it's easy to forget to rebase, or notice that there are more commits. So our history is mostly clean, but occasionally contain some "messy" commits.
I know we can reset master, but it feels a bit too risky compared to living with a bit of noise when mistakes happen.
Anyone knows of some way to prevent / alert / notify or otherwise help us avoid this rather common mistake before it happens?
I recommend regularly updating your branch of master via fast-forward. That way if you are working with a slightly different git history, git will complain loudly (refuse to update).
This also has the benefit of complaining if someone else has force pushed (changed history / removed commits) to upstream/master.
I guess it's unclear what you meant by "mess". Some companies think training their devs is worthwhile, especially after losing days/work/$ through git messes. :)
The commits aren't bad as such, just noisy, typically. Github gives you a nice diff of the whole branch against master, so we normally don't pay attention to individual commits. We look at the change as a whole before merging it.
There is an indicator for the number of commits and you can view each one individually, but somehow it's so easy to forget.
Yes, that's one possible solution, but feels like we're punishing ourselves for every merge for the sake of avoiding a reasonably-rare issue. We're reviewing PRs on github, so it's much more convenient to merge them on the spot.
The biggest disadvantage I see isn't in the branching model per se - it's that git itself does not record branch history. By "does not record branch history" I mean that branches are really just pointers to a specific commit in the commit history. However, git doesn't record where that branch pointed to IN THE PAST. So, when looking back in time and you look at a merge commit (say between a feature branch onto development), you can't immediately tell which of the two paths BEFORE that merge commit was originally the feature branch, and which was originally the feature branch. A great example of this is looking at the network view in github, which can turn into a confusing mess because how github decides to color the branch paths is NOT necessarily how those branch paths existed in the past.
If anyone has a solution to this problem please share!
The only solution I can think of is to write a prepare-commit-msg hook that adds a line like "On branch: <branch-name>" to the commit message. So when reading your commit history, every commit that was made on this branch would contain this message. You can also look up just commits made on a particular branch by doing `git log --grep="On branch: <branch-name>"` this way.
At the very least, it may be useful as a starting point.
While I use it extensively on large projects, I find that the merge commit can do just as well. Of course, that doesn't help you outside context of the merge commit---bisecting, for example---unless you are okay with discovering the merge commit that introduced it into the mainline. That can still be scripted.
The convention is to make sure that the stable branch is the left parent, and the feature branch is the right, and that the merge commit log messages are decent (Gitb web UIs do this fairly decently by default). Now you can get a log of left-parent merge commits to see the log of features that shipped (or releases that shipped if you have multiple tiers of stability).
This is exactly what I like about git and don't like about Mercurial: keeping book about which was what does not matter IMO, and complicates (automatic) reasoning (e.g. bisect) about the development process. What does matter is that concurrent development took place, and that's exactly what git's commit DAG represents.
> its much easier to revert a feature if you have a merge commit for it
I like this theory, and generally like merge commits because of it -- but in practice, I've found it still _really really hard_ to revert a feature even if I have a merge commit. Simply reverting the SHA of the merge commit does not, I think, do it, git complains/warns about that. I have to admit I still havent' figured out how to do it reliably even with a merge commit!
You don't need to implement ALL of gitflow - I see it as scalable.
Master should always be latest production code, development branch contains all code pre-release. That's the core.
The other branches let you scale gitflow - if you need to track upcoming release bugfixes etc, you can use a release branch. A team of maybe 6 or 7 would likely start to need a release branch. Feature branches at this point are best left local on the developers repository. They rebase to fixup commits, and then merge those into develop when they're ready.
If you get into bigger teams - like maybe 6 agile teams working on different larger features, then you can introduce feature branches for the teams to use on sprints to keep the work separate.
The issue with gitflow is the lack of continuous integration, so I personally like to get teams to work only on a develop branch during sprints and use feature toggles to commit work to the develop branch without breaking anything.
As I see it, gitflow and CI are at odds and that's my biggest gripe with integrating lots of feature branching for teams - everyone has to integrate at the end of the day.
So I believe the model can and should be scaled back as far as possible, using only develop and master and release as primary workflow branches, introducing the others when the need arises - doing it just because it says so in the doc isn't the right approach.
I am. I resent such articles unless they come from a very clear eminence who has public and verifiable evidence to support his case.
This seems more of a: "This tool is popular but it doesn't work for me so it's bad".
In fact, as you say, he dislikes the tool (from the get go):
> I remember reading the original GitFlow article back when it first came out. I was deeply unimpressed - I thought it was a weird, over-engineered solution to a non-existent problem. I couldn't see a single benefit of using such a heavy approach. I quickly dismissed the article and continued to use Git the way I always did (I'll describe that way later in the article). Now, after having some hands-on experience with GitFlow, and based on my observations of others using (or, should I say more precisely, trying to use) it, that initial, intuitive dislike has grown into a well-founded, experienced distaste.
Throwing my two cents. There's no perfect methodology and teams that communicate and adhere to a set of standards will probably find a good way to work productively with git. They can always be helped with scripts like the gitflow plugin or some other helper if they think the possibility of human errors is big.
I also have anecdotal experience of working with and without and, being fine with either although I do appreciate git flow in any project that starts getting releases and supporting bug fixes, hot fixes and has been living for a while so it incorporate orthogonal features at the same time.
At this point, I just assume that "X considered harmful" contains an element of satire. That's not always the case, but I have no problem saying "Satirical 'Considered Harmful' Articles Considered Not Harmful".
He is suggesting to use 90% of what GitFlow suggests (feature/hotfix/release branches) but doesn't like the suggestion of non-fast-forward merge and master/Dev and that makes GitFlow harmful? I don't think I agree.
I think having the Dev branch is useful. Consider this actual scenario at my current workplace.
1. We have 4 developers. Nature of the project is such that we can all work independently on different features/changes.
2. We have Production/QA/Dev environment.
3. When we are working on our individual features, we do the work in individual branches and merge in to Dev branch (which is continuously deployed).This lets us know of potential code conflicts between developers in advance.
4. When a particular feature is 'developer tested', he/she merges it into a rolling release branch (Release-1.1, Release-1.2 etc) and this is continuously deployed to QA environment. Business user does their testing in QA environment and provides sign off.
5. We deploy the artifacts of the signed off release branch to Production and then merge it in to the master and tag it.
Without the development branch, the only place to find out code conflicts will be in the release branch. I and others on my team personally prefer the early feedback we can get thanks to the development branch.
Advantages of an explicit merge commit:
1. Creating the merge commit makes it trivial to revert your merge. [Yes, I know it is possible to revert the merge but it's not exactly a one step process.]
2. Being able to visually see that set of commits belongs to a feature branch. This is more important to me (and my team) than a 'linear history' that the author loves.
We have diverted from GitFlow in only one way, we create all feature/release/bugfix branches from 'master' and not 'develop'.
Now, don't get me wrong, GitFlow is not simple but it's not as complicated as author seems to suggest. I think the author was better served with article title like 'What I don't like in GitFlow'.
A reason that we are switching from full git flow to a reduced model (basically one master branch + feature branches, occasional hotfix branches) is that git flow isn't compatible with continuous integration and continuous delivery.
The idea of CI is that you integrate all commits, so you must integrate the develop branch - build the software, run the tests, deploy it to a production-like environment, test it there too.
So naturally, most testing happens in that environment; and then you make a production release starting from the master branch, and then install that -- and it's not the same binary that you tested before.
Sure, you could have two test/staging environments, but I don't think I could get anybody to test their features in both environments. That's just not practical.
Is this more a limitation of your CI/deployment tools? What we do (not saying it's the true way) is have TeamCity automatically build both master (production code + hotfixes) and development branches. Our deployment tool (Octopus) can just grab a particular release (e.g. 1.1.500=master 1.1.501=development) and send it to the server for testing. Hotfixes would be committed to master and tested with a build from there.
I guess this does open up the possibility that merging master (with hotfixes) back into development could cause a regression, but we certainly try to keep hotfixes minimal and simple.
Now database changes...that's the real pain point. Both master and development need their own DB to apply changes scripts to. Otherwise, deltas from development make testing master an issue.
At my last company, that was 60 live feature/bug branches, each having to be built by CI before being mergeable. The feedback loop was huge, and at the end of the day, the quality of the product did not improve over SVN. develop was a shit show and twice a month, master would be as well.
Ultimate they decided to move to team branches, where each team branch was free to operate how they want so long as the team branch itself built successfully before merging into master. I think most teams adopted the more natural-feeling GitHub Flow.
Personally, for me it's not even the god-awful history that makes me despise gitflow, but its reliance on additional tools to effectively manage the process. This should be a huge red flag to anyone seeking to change a process, and it's complained about a lot. Cowokers not knowing what git-flow is doing under the covers is dangerous. I consider myself pretty versatile with git at this point, but I have no idea what the tool does under the covers. I'm sure I could find out; however, when you're handed a piece of software, generally you learn the contract/api it provides, but most of us aren't going to delve into the implementation details.
Sure, our CI tool can run the unit tests from all the branches, but we don't have an environment for every branch where we can roll out the software, and do manual tests and automated acceptance/integration/smoke tests.
So if you have just one environment for testing, you can decide to deploy the develop branch to it, in which case you deploy untested builds (from the master branch) to production.
Or you can decide to always deploy the master branch to the testing environment, in which case you have to do a release each time you want to show somebody your progress (and you can't easily show it in dev); that's just annoying extra work, and goes against the idea of continuous integration.
Open source projects should stick with Merging over cherry-picking and rebasing especially if you want others to contribute. Unless you feel fine doing all of the rebasing and cherry-picking for them. Otherwise, good luck gathering a large enough pool of people to contribute. Simplicity always wins here.
2. GitFlow vs X
Once again do what is good for your company and the people around you. If you have a lot of developers having multiple branches is actually /beneficial/ as Master is considered ALWAYS working. Develop branch contains things that are ready to ship, and only things that are READY TO SHIP. So if your feature isn't ready yet, it can't go to develop, and it won't hit master. Your features are done in other branches.
3. Rewriting history
Never do this. Seriously, it will come to bite you in the ass.
> I thought it was a weird, over-engineered solution to a non-existent problem.
To be fair, its a cookie-cutter approach that resonates with people unfamiliar with git but not ready/willing to invest the time to understand it deeply. That is understandable; a lot of people come from other systems and just need to get going right away and gits poor reputation for command-line consistency etc. is well-earned.
His point is that this cookie-cutter approach is more complex to the development model he presents (and is in fact pretty common among open source software). You don't need to understand Git deeply to realize that master is stable and merges in finished and cleaned up features.
I should have been more clear - I agree that its more complicated. My point was that regardless, it seems to resonate. I believe its because the complications look useful at a glance and layering a rigid model on top of git frees you from having to consider its full scope of possible operation.
(I believe that git flow is definitely better than "everyone does things their way", and that's one competing "rule-book" for a team new to git.)
I'm pretty confident that understanding the tool better will help you to judge how to use it more effectively. The best way to understand git is to understand its data-model.
The only thing GitFlow had going for it is that it has a clearly written article about it with pictures that explain how it works. That's it; the freedom of git and being able to define what works for you is too much for people and they think they need to turn development back into Subversion-style or desktop-release style.
I agree with you, with one addition in GitFlow's favor--it standardizes how your team works. When you have multiple team members collaborating on a project, a poor standard is better than none at all.
GitFlow is also in my opinion a bad flow as it does end up with a merge commit spaghetti over time.
Merge commits are great. They are here to group a list of commits into a logical set. This logical set could represent one "feature", but not necessarily. It is up to you to decide whether commits A B C D should or shouldn't be grouped by a merge commit. Merge commits also make regression searchs (i.e. git bisect) a lot faster. And to top it of, they will make your history extremely readable, but that is granted you merge correctly... and that is where git rebase and git merge --no-ff come into play.
At my company, every developer must rebase their topical branch on top of the master branch before merging. Once the topical branch is rebased, the merge is done with a --no-ff. With this extremely easy flow, you end up with a linear history, made of a master branch going straight up and only everyonce in a while a merge commit.
Following the simple rule "commit, commit, commit..., rebase, merge --no-ff" avoided the merge spaghetti a lot of people compain about. Although, I have to admit our repository is small (6583 commits to date).
This works even when multiple devs work on the same branch: they must get in touch on a regular basis, rebase the branch they are working on and force push it. Rewritting history of topical branches is only bad if it is not agreed on. As long as it is done in a controlled manner nothing's wrong with it.
Another rule we follow is to always "git pull --rebase" (or git config branch.autosetuprebase=true).
Our approach might not, however, scale for larger teams or open source projects.
I have had this ideological debate about "fast forwarding" more times that I can count. I agree with the author "no-ff" is silly. I've been working on professional software teams for over a decade. When I encountered fast forwarding/rebasing it was absolutely a breath of fresh air. I've been using git now for 5 years and I have not encountered a single instance where using either of these tools has presented any sort of problem. I can't remember a week where having a concise, readable history hasn't proven its value. I also can't remember a single time I've said "Man I'm glad I had all these merge commits around they really saved my proverbial bacon"
From what I can tell no-ff exists to satisfy the aesthetic preference of your local team pedant. It gives them something to do between harping on whether your behavior is in the correct "domain", deciding if a list comprehensions are truly "pythonic", and spending that extra month perfecting the event sourcing engine to revolutionize the "contact us" section of your site.
well can you provide the killer use case that none of us can live without?
I mean its certainly possible that in some tiny fraction of cases I might say "man I could fix this a lot easier if I had the merge commit" its just in the 10,000's of examples that form my experience I haven't stepped on that particularly landmine yet.
Even with that said, my development philosophy compels me to choose "Simplicity over Completeness" and is utilitarian to the core. I will chose whatever is most effective in the vast majority of cases.
Some folks look at "Source Control History" as some pristine, historical record of how things went down. Since I am not an accountant or auditor this has little value to me. It encumbers the day to day to optimize for a case that is almost certain to never happen. A first-order approximation of the history that optimizes for the day-to-day needs of an organization is far more suitable in almost every case.
I use the term "local team pedant" its not a bad thing. Some folks just have a need for things to be "complete" and feel compelled to do so for irrational(usually expensive) reasons. In my own experience the person that is the "no rebase/ never fast forward" cheerleader can never give a solid objective answer as to what the benefit is. Its usually always something like what this no-ff-er suggests(http://walkingthestack.blogspot.com/2012/05/why-you-should-u...) . Things like "I can see whats on a branch, etc." That in itself is not a justification. Its just words. If you could someone how demonstrate how this reduces development costs or offers a better way to organize work and is simultaneously better than the more idiomatic alternatives then I'm all for it.
The killer use case for me is getting to figure out how and when something happened.
Also merging two branches that tend to touch upon same modules but are not kept in sync all the time (due to whatever reasons) is a lot simpler when you use --no-ff.
You say that these properties are bullshit, but I found them invaluable when fixing bugs and architectural defects and where it was important to find out when and how the bug or a behaviour occurred. And funny that you mention accountants and auditors. Because being able to do forensics more easily on the codebases that I worked on has saved me and my clients many hours and gray hair. And I have found myself in situations where
I can live without --no-ff, I can live without git even. There is plenty of people out there not using any kind of source control and they are living just fine.
Your last paragraph is a nice example of psychological projection. If you weren't so narrow minded you could have used your energy to learn something.
Myself I use --no-ff because it fits the kind of work I do very well. I haven't lost a minute of sleep or time over its deficiencies. And I am pretty certain that I spend a whole lot less time fiddling with git than people who advocate "agressive rebasing".
I do admit you have some merit in pointing out how bad a git history looks when littered with merge commits for single commit branches. But then, this is pretty easy to fix. Just use a fucking vanilla merge, or indeed a rebase when it fits the problem. Another way to work around the bad aspects of --no-ff approach is by using a better git history explorer.
I never said it was bullshit. I'm just looking for your "falsfiability" criteria. My point is "no-ff" is cluttery and no one can tell me in "objective terms"(e.g. newtons, projects/year) backed by clear evidence that there is any benefit. All I see is downside. Your points are pretty shakey lets break them down shall we?
> Your last paragraph is a nice example of psychological projection. If you weren't so narrow minded you could have used your energy to learn something.
well thats clearly ad hominem nonsense. So nothing to see there.
>Myself I use --no-ff because it fits the kind of work I do very well. I haven't lost a minute of sleep or time over its deficiencies. And I am pretty certain that I spend a whole lot less time fiddling with git than people who advocate "agressive rebasing".
No point to any of that either.
>Also merging two branches that tend to touch upon same modules but are not kept in sync all the time (due to whatever reasons) is a lot simpler when you use --no-ff.
This might have some merit. But I don't really understand what you mean.
>You say that these properties are bullshit, but I found them invaluable when fixing bugs and architectural defects and where it was important to find out when and how the bug or a behaviour occurred. And funny that you mention accountants and auditors. Because being able to do forensics more easily on the codebases that I worked on has saved me and my clients many hours and gray hair. And I have found myself in situations where
Here you actually makes some points but they are all anecdotal. How many hours has it saved you? How could you even know this unless you split time into two parallel experiments and measured them independently? Its not that there is anything wrong with relaying an anecdote. But when you introduce this handwaving nonsense as justification for something as though it has the same strength as an objective measurement that I call bullshit.
>I do admit you have some merit in pointing out how bad a git history looks when littered with merge commits for single commit branches. But then, this is pretty easy to fix. Just use a fucking vanilla merge, or indeed a rebase when it fits the problem. Another way to work around the bad aspects of --no-ff approach is by using a better git history explorer.
After all those words here you start to make some sense. Of course I would use `no-ff` if I saw some value in a "merge commit" which is the mirror to your point. But my point is just that ... I've never encountered that case or had it concisely explained to me
no-ff is useful when you need to revert the whole merge - the merge commit delimits the boundaries of what used to be a separate branch. If you fast-forward, then you have to use your human brain to figure out how many commits to revert and when you're reacting to "augh I just broke master by adding these 12 commits" you might make a mistake.
Yeah that's why you squash your 12 commits into a single commit that represents the deliverable.
This is my point I find the 12 commits to be unnecessary. I've never been burned by squashing. The only arguments I've heard against it are ideological(you're destroying history, etc.)
oh! Well, I like reading many little diffs more than I like reading a few big diffs, but the distinction is purely aesthetic. A team should try to agree on an aesthetic, but otherwise, ::shrug::.
Its not exactly aesthetic. Little diffs and big diffs are not isomorphic. They could conflict with eachother. The only true diff is either diffing a squashed commit or diffing over the whole range. And if you are diffing over a range all the time you might as well be squashing and rebasing so you get the benefits of linear history.
This is my primary complaint regarding rebasing and no-ff. It leaves all these useless and confusing commits around. A lot of times the content of the commit was undone by a commit in the family. Its usually not useful to anyone and can only be reasoned about by the original author. When you merge all that crap lands in `master`
Interestingly I think I am actually learning something during this thread. The hardcore `no-ff` actually misunderstands rebase. They find the billion crap commits to be disruptive as well and want the merge commit so they can get rid of that mess. But I just say the mess is unnecessary. Just squash and you get the best of all worlds. You'll never miss those 12 commits
The time it takes to carefully rebase a branch onto another, and to compress commits for a feature into one, is still much longer than the time it takes for my eyes to pass over so-called "empty" merge commits.
If I want to look at when a feature entered a branch, I can look at its merge commit. And the feature branches are there to show how a feature was built; bugs could be the result of a design decision that happened in one of the midway commits.
I looked at OP's example pic in the blog, and I read all of his words, but I wasn't sold. His picture looks like a normal git history to me. It requires almost no effort to find what I'm looking for.
And that's not even touching his rage against the idea of a canonical release branch (master). But that's for another day.
It's not confusing, maybe it's not suited for your particular case, as any other tool. There's no magic tool/process/etc that does it all for everybody.
GitFlow has been working great for us. A team of 15 developers, working with feature branches, we have our CircleCI configured to automatically deploy the "develop" branch to our "QA environment", and our "master" branch to "production" environment.
The "hotfix" and "release" are proven to be useful to us too; we just need to have effective communication with our team, so everybody rebase their feature branch after a merge in our main branches.
I have come to actually like the two permanent branches approach. I know that for any repository that follows this model that:
* "master" is the current stable release
* "develop" is the current "mostly stable" development version
The first time you clone a repository this is an extremely helpful convention to quickly get your head around the state of things.
If you're doing it right (and don't use --no-ff, which I agree is unreasonable), I can't think of a scenario where this causes extra merge commits. Merges to master should always be fast forward merges.
We follow this model. I like thinking of "develop" as an integration branch, and "master" as an always-deployable gold master. And yep, develop -> master merges are always fast-forward merges.
Advising rebasing over explicit merges is dangerous and foolish. Rebasing does have its place, but you really need to know what you're doing.
Also, I don't see his point about that messy history. I can see exactly what happened in that history (though the branch names could have been more informative). With multiple people working on the same project, feature branches will save your sanity when you need to do a release, and one feature turns out to not be ready.
I like subsetting gitworkflows(7) because you can incrementally add process when the tangible benefits (like increased reliability and experimental access for eager users) outweigh their process cost (which depends on team experience). I wrote about these issues here:
To me (a mercurial user mostly) this is kind of like a "no duh" article having never read the original "gitflow."
I think that is because I am used to using hg's branches, bookmarks, and tags for different use cases.
If I want to mark a revision as a particular release number (which is something we don't really do here but I can see the value) then I would use hg tag. Tag's are permanent.
If I want to mark a revision as "production" and then have some automated process take over based on the the updated info, I would use hg bookmark. Bookmarks are the closest equivalent to git's branches. Bookmarks can be updated to a new revision or removed.
If I wanted to work on a parallel branch of development for an experimental feature or if I am attempting to upgrade some dependencies, I can use hg branch. This creates a named branch in the code base which is permanent. This branch can eventually be either closed or merged back into the main.
It seems like one of the ironies of git-flow is that it would actually work better if you used it with Mercurial rather than Git, because Mercurial stores what branch you were on when you made a commit. This means that a tool could automatically look at the Mercurial commit tree and figure out which swimlane each commit belongs in, and use this information to draw a commit history tree that wasn't such a mess.
I apologize for the self-promotion, but this answer on Stack Overflow (and the question) talks about this difference between Git and Mercurial, and includes links to articles that explain it better than I could:
One main branch is great, and also if working with a large number of contributors I really like a clean history, and makes things much easier to review.
It's kind of a shame something got branded with a slick name like "GitFlow", when "doing it the way you ought to be doing it" doesn't have a slick name :)
A single eternal master works for a Continuously Deployed app/site.
Not for any other project where maintenance releases are a norm. This includes stuff strict API compatibility projects, semantically versioned frameworks/plugins/libraries, many forms of desktop/offline apps, some android apps, most enterprise apps, etc - more or less where developers don't have the liberty to thrust the latest master on their users.
I'm not against CD, and not a big fan of Git Flow either. But different things have their own uses. I'm really liking GitHub Flow and GitLab Flow though!
Right, when you need to maintain (and patch) old versions of a piece of software, having eternal release branches is necessary. The fixes on those old versions often don't ever want to be merged back to master because the code is very different in more recent versions.
> All other branches (feature, release, hotfix, and whatever else you need) are temporary and only used as a convenience to share code with other developers and as a backup measure. They are always removed once the changes present on them land on master.
From an open source developer's perspective I need more "eternal" branches because I need to plan future releases. Putting everything into master makes the decision for me (if I have a breaking change I have to bump a major version even if maybe I want to delay doing that).
I wish GitFlow had not called that branch "master", and had called it "released" or "production" instead. It's really useful to have a branch which you know always exactly represents the code running in production. You can keep an IDE pointed at somewhere and update when you need to without worrying about tags or whatever. This is the one part of it I've tried to sell to colleagues, which would have been easier if it had a better name.
I think he makes a valid point about how it's not necessary to have both develop and master if you use tags. On the other hand, I think the `--no-ff` merges is what git-flow got right. The separation of features into their own branches is useful. It's basically about grouping related commits together. You can always render the history in a way that looks prettier and even if you can't--the history doesn't need to look pretty, the final product does.
I don't see why there has to be "this is harmful" and "this is a better way".
I've used all kinds of branching models... I've used just a master branch and you commit directly to master. I've used full git-flow.
I think the branching model you use is dependent on the people and the project. But really no matter which model I've used it seemed to me to be fine... And if it wasn't fine, we extended it to meet our requirements.
Most of the merges in his first pictures aren't even fast-forwardable, so his complaint about no-ff seems.. weird?
You should still rebase your feature branch on top of whatever you're merging into whenever you can, even if you're using git-flow. That's just common sense. When you do, your history looks almost the same as in his 'pretty graph', there's just one more 'link' back to the previous feature merge.
The advantages of this additional context are important. Firstly, you can get a compressed view of only the features that were merged (without detailed commits) with something like `git log --first-parent`. I guess the only way to do that in OPs approach is `git log | grep 'SPA-'`? Rather... unreliable.
Using no-ff also means you don't have to do the silly thing of putting your issue name / branch name in every commit title. Titles are pretty short already, having to allocate ~10% of it to tracking the name of the branch is just wasteful. With no-ff it's obvious which feature the commit is for (the branch name in the merge). If your tool fails to present that in a reasonable fashion, that's disappointing, but the data includes this context and that's the most important thing.
As to the master/develop split, yeah I could be convinced it's unnecessary. Still, I think it's convenient to have a clear separation of 'this code is in production', 'this code is in development'. If you just make a release branch then merge it into develop, you have to know the exact tag before being able to find the latest release. 'master' being the alias for 'latest release' is fine.
It is not quite as easy or quite as reliable as it with merges, but generally "branches" come in as recognizable chunks of commits, and you can either revert them with a range, or interactively rebase them out of existence, depending on your goals. It's generally not very difficult, but in some cases it may be.
I'd also suggest that you want to make sure you consider the full totality of costs, because it's very humanly easy to see this one feature that you recall using a lot, when in fact you can easily recall it precisely because it is a rare event (and thus worthy of memory), whereas the costs of a complicated branching structure are continuous and ongoing.
I'm not saying that linear is therefore guaranteed to win for you, just pointing out the cognitive danger of seeing the big, rare expensive costs and missing the continual drip of small ones.
That said, I'm not necessarily 100% linear myself, but I do sometimes feel like git made branches easy and some people overreacted. If you've got a branch that lived for at least, say, a week, and had significant independent work within it, then by all means merge it and keep a merge commit. But this workflow creates branches upon branches upon branches, and then keeps them around forever in the history. I'm not convinced that last bit is necessarily a good thing... I create a ton of branches, sure, but I only keep big ones that actually mean something, not every little bug branch with one commit of one line. There is a happy medium available here, too.
- Checkout master.
- Start an interactive rebase of master onto the last
commit before the series of commits you wish to remove.
- Mark all the commits you don't care about as "skip".
- Let the rebase run and resolve conflicts on the way,
the same as you'd do with your current work flow.
This rewrites history, right? What I meant was a feature branch which got merged into master turned out to introduce unwanted behavior, so while a fix is rolled out to the feature branch I'd like to remove that code from master. What I currently do is revert (git revert, which generates a new commit) the merge commit(s) used to bring that feature branch into master, then when the fix on the feature branch is complete I revert the revert I just made and merge the feature branch again.
Surely it isn't a great idea once "the" repo has a copy of the dodgy commit, though? Because once everybody else has a copy of it, the history hashes will include it, and removing it with a rebase is going to become more bothersome.
(N.B., I am English - by "bothersome" what I suppose I actually mean is "a massive pain in the arse". Because everybody is going to have to take all their commits since the dodgy one and rebase them on top of the new history. Maybe I should rephrase that to "a massive pain in the fucking arse". But it's always possible there's some git magic I'm not aware of.)
Generally it is not a good idea. But sometimes one has to admit having fucked up, put a marker on an old branch to let people know it's fucked and clean things up. Especially when the alternative is "break half of git's tools by injecting revert + revert-revert".
If i am understanding you right you seem to consider the act of rebasing itself to be "a massive pain in the arse".
In a culture where rebasing of feature branches is considered mandatory anyhow, rebasing a feature branch sideways onto a new unbroken master branch is nothing out of the ordinary and quick and easy.
The constructive thing in this case would be to introduce your own solution to the problem and argument for it, not bring the level down to day-care level and go all "wow, i'm just gonna avoid you now.".
Are you serious? I was being polite and posted a question asking for some constructive feedback. I got some of this, then Mithaldu went off the deep-end and decided to start ridiculing me and tell me how he'd avoid the company I work at because of the questions I'm asking. And now you're saying that I'm the one crossing a line here? Seriously? I just re-read the guidelines, which did I break? The only one I can imagine is not flagging the comment but that was because the software did not let me since it was a reply to one of my posts.
I have been perfectly civil. Telling someone they are not being constructive and should grow up is not uncivil when it's in response to:
> Please tell me what company you work at so i can avoid it. There's no other way to say it other than that your ideas horrify me.
Be sure to read that again and let it sink in. This person wants to know what company I work at so he can avoid it because of a question I asked on the internet about git.
Further, while i think HN is inconsistent at best in how it applies its rules², i realize that dang does have a point and i could've dug further with questions, to see if your mind could be changed. However, the same applies to you just as much. Instead of accusing me of being a child³, you could just as well have asked why i recoiled in horror and thus rerail the discussion in a constructive direction, like to3m did: https://news.ycombinator.com/item?id=9745547
--
² The rules are hilariously strict, which can be a good thing, but applied to only a small percentage of rule-breaking posts.
Sorry, but when you react like the way you did I simply lose any interest in discussing the matter with you any further. Frankly it also makes you lose credibility in my book; if you cannot manage to have a polite conversation where you disagree with someone then it paints a picture of someone who yells and walks away whenever someone disagrees with them. A person like that may believe they are right, but if they can't properly discuss things, how would they know? So why should I trust them? Not saying that this interpretation is accurate or anything, that's just how your posts came off and why I wasn't interested in further discussing the matter with you and reacted the way I did.
But very well, as long as we can keep things polite:
There exists a possibility you misinterpreted my intent. I am talking about the following scenario:
* Feature branch is written by a developer, has been tested (not as much as they'd like, because deadlines) and it is merged into master
* Master branch is made publicly available, CI builds from it, and builds wind up in production
* Some time passes, eventually a critical bug is uncovered in the code from the feature branch. This is bad and should have been caught earlier, but it didn't, and shit happens.
* Work starts on a fix as soon as the developer resources are available. In the meantime the code from that feature branch should be "disabled" so the product remains functional in the meantime.
So what has worked well for us is reverting the merge commit(s) where the feature branch is merged into master. This commit is given a descriptive commit message and we actually appreciate there being a mark in the revision history about this.
We have dozens of these projects for different customers, and less than a dozen developers. Rewriting the history of the master branch would result in chaos as every developer gets to merge their own feature branch into master.
Are you still recoiling in horror? If so, feel free to elaborate as to why and how you would do this better.
Sadly i don't have mounds of time right now to write bigger answers, but i first need to figure out what your knowledge is, because we're not quite on the same page yet.
> every developer gets to merge their own feature branch into master.
Are you sure you wanted to say merge? If the master branch is rewritten into a new branch, feature branches on it would need to be rebased normally.
Is there a reason that you're saying "every developer gets to"? Normally they'd only need to rebase when they wish to integrate it back into master anyhow, which is likely to be conflict-free if they didn't rely on the feature, and also likely to be conflict-free if they wait until you've re-added the feature in a fixed state. And if they decide to rebase before you re-add it, then they'd have to deal with those conflicts exactly the same as if they'd merge before your revert-revert.
Do your developers do the SVN thing of continually merging master into their feature branch? Typically that is a bad habit which should be replaced with rebases of the feature branch.
Why do you care if history gets rewritten? Because you fear it would create more work, or because you consider history inviolable.
> we actually appreciate there being a mark in the revision history about this.
I think tags with descriptions would serve the same purpose. Do you see issues with that?
> Are you sure you wanted to say merge? If the master branch is rewritten into a new branch, feature branches on it would need to be rebased normally.
Yes. The way we currently do things is keeping things as simple as possible for everyone, even if this doesn't produce the prettiest history. A developer first creates their feature branch based off of master, commits several times to this feature branch, tests it and possibly asks for review from a coworker and then they merge this feature branch into master using --no-ff, always creating a merge commit.
It's possible there have been commits to master since they based off of it. If so, they will resolve this in the merge. I do think it would be better if feature branches got rebased prior to merging onto master so they can be tested in this state as well but this is currently not how things are done (it is not up to me). Though that could create issues if there's multiple developers collaborating on a feature branch and they do not communicate properly.
The reason I said "every developer gets to" is that in larger projects (which also usually use gerrit, which do not) I've seen that there are usually a select few people in charge of merging the feature branches into master, instead of everyone getting to merge as they see fit.
My view on rewriting history is that it should not happen on shared branches as this will surely lead to issues where one of the developers does not do the entire 'fetch rebase merge' dance correctly, or works directly on master. We also contract some work out to external agencies and their git usage is usually absymal, and I am not in the position to correct or influence any of that.
> I think tags with descriptions would serve the same purpose. Do you see issues with that?
I'm not sure how the tags with descriptions would help with easily rolling back all the changes introduced through a feature branch, but maybe I'm missing something.
Merging feature branches off of one branch that was rewritten into a new one into the new one is nothing anybody would ever do or recommend. Rebasing is a requirement here, and truthfully an operation with less mental overhead, since it's actually just "copy all of those commits over to here, then put the branch marker at the end of the new copy".
It kind of sounds to me as if you're saying "i don't want my developers to not learn rebase because it's easier if they learn a little less". Is that a correct understanding? If so, then you're trading a little upfront work for a lot of cleanup work later on.
> My view on rewriting history is that it should not happen on shared branches
Depends on how shared they are and how big the fuck-up is. Everything is relative.
> as this will surely lead to issues where one of the developers does not do the entire 'fetch rebase merge' dance correctly
There should not be any merge involved there. See first paragraph.
> or works directly on master.
That developer gets a notice to get his shit together and gets fired if he persists. This is teamwork, not cowboy work.
> I'm not sure how the tags with descriptions would help with easily rolling back
They don't. They help with marking places where old broken branches were abandoned and marking places where their new counterparts were reinstated.
Lastly:
> it is not up to me
If it's not up to you, and they'd not be willing to listen then the conversation is moot and your org is broken. You can't wedge one small part of a rebase-based workflow into a larger workflow that pretends git is just SVN with a different syntax.
I can tell you how to do everything correctly and how it will actually make things nice and smooth, but if everything is not an option, then a little can't be applied to your situation.
> Merging feature branches off of one branch that was rewritten into a new one into the new one is nothing anybody would ever do or recommend. Rebasing is a requirement here, and truthfully an operation with less mental overhead, since it's actually just "copy all of those commits over to here, then put the branch marker at the end of the new copy".
I'm not even sure what you're trying to say here. In general we don't rewrite history of shared branches thus rebasing is certainly not a requirement.
> It kind of sounds to me as if you're saying "i don't want my developers to not learn rebase because it's easier if they learn a little less". Is that a correct understanding? If so, then you're trading a little upfront work for a lot of cleanup work later on.
They're not my developers. I'm one of the developers. I'm trying to educate my coworkers more in the ways of the git, but it's slow progress. There has been no cleanup work so far, mostly because we're not obsessive over having a "clean history", and are instead more than happy with a realistic one.
> Depends on how shared they are and how big the fuck-up is. Everything is relative.
It's not a fuck-up. Bugs happen, noticing and fixing them is progress. The revert commit references the issue tracker, and this itself is valuable history. Instead of the commits simply disappearing for a time.
> That developer gets a notice to get his shit together and gets fired if he persists. This is teamwork, not cowboy work.
Git is there to faciliate teamwork, not to get in the way.
Just to remind you of what the context of this discussion is:
You commented on an article that says, paraphrased: "Stop merging criss and cross y'all, it's dumb and counterproductive; rebase everything instead, it'll help you in the long run."
In that context, you asked "well merge commits are easily to revert, if i can't merge, how to i revert a feature branch?"
I answered to that, in the context given by the article on which you were commenting. That context is: "Rebasing all your things is mandatory."
As i said in my previous post, if you're not willing to even mentally adopt that context, then nothing i can say would be helpful.
Are you interested in doing that?
--
> we're not obsessive over having a "clean history", and are instead more than happy with a realistic one.
I deeply resent the implication. This kind of stance is what led me to my initial post you took offense to. What you perceive as obsession is experience and knowledge by people who have experienced both sides of the coin, along with their benefits and pains, and made a rational decision on which is more favourable in the long term.
You won't find me accusing you of obsession with fixed history just because you haven't had the (mis)fortune to make the full range of experiences on this.
> Instead of the commits simply disappearing for a time.
Nothing disappears, that is specifically why i mentioned tags. If one abandons a branch and the history is valuable to keep around, you stick a tag on the end of that branch, give it a description pointing at the ticket, then put a similar tag on the end of the remade branch, and off you go.
> not to get in the way.
How is git getting in the way of teamwork in your opinion?
> It's not a fuck-up.
Bugs are fuck-ups. There are no two ways around that. Especially when they're bad enough that a feature branch would be removed wholesale, and not simply disabled with a flag.
Exactly. I love this process, and endeavour to drop all dev/develop branches from the repos I'm working in. Sensational headline aside, I've seen a lot less untangling when feature-branching off of master and a judicious use of tags.
While it's dated at this point, I've always felt that the Github flow [1] works best (for the projects I'm involved with anyway).
We use a more extreme version of rebasing feature branches before merging into master: we squash the features into a single commit when merging to master. The reason is that we don't care about the (sometimes hundreds of) commits that made up a feature. What matters is that it works as designed and passes tests. If we merge a feature to master and then need to revert it, we will revert the whole feature.
This also allows us to keep merging master into feature branches, (where there is only a single commit that might need to be manually merged) instead of rebasing feature branches on master (in which case it can be necessary to manually merge multiple intermediate commits).
What cleared up git merge --squash for me was a comment showing that:
That loses a lot of information, though. If you want to bisect to find a particular bug, tracking it down to the merge is a good start, but I'd rather have the actual commit (from the maybe hundreds) that went into the merge. Sure, you can revert the feature, but what if you want to fix the bug?
I see GitFlow as a pragmatic workflow customized to cloud-based software. Master is auto-deployed, and Dev acts as insurance.
We're currently having lots of success with this:
* Always work in a feature branch.
* Pull master + rebase feature branch when done.
* Merge to master with --no-ff --edit and include a summary.
Rebasing feature branches keeps them readable and avoids continuous merges. Disable fast-forward keeps the log for /master abstracted to feature-level, but the details are available in the graph.
Major releases are branched, minors (bugfixes) are tagged. Bugfixes are made in master and cherry-picked into the release where possible.
Currently our CI build only works on /master, but in the coming month it'll build all feature branches which have been pushed to the main repository.
This is very similar to how Perforce streams work, but it's distributed. If you really hate distributed version control and love GUIs then I can recommend Perforce.
That's the first time I heard of GitFlow. People seriously do software development this way? I find that hard to believe.
What the author describes is fairly close to what I've been using in a number of companies now for the last 9 years or so.
Whether to rebase is a personal preference. I tell developers to always rebase local work before committing. Unobserved changes might as well not exist (if a tree falls in the forest and no one is there to hear it falling, does it make a sound?), so if you haven't pushed your work, rebase it. No one cares when you did the work.
As for feature branches, it depends. If the history is clean and there aren't too many at one time, we might merge without rebasing. But I still prefer to clean up the commit history and rebase. I don't understand the obsession with "true history". History is written by victors, in this case — resulting work/code.
The whole point of hotfixes is that they are relative to old code and that they alter what is considered the current version of that old release. Which is important when (as is typical in business) you have customers who are on specific releases and either haven't paid for the new hotness, or haven't integrated to it and don't yet want it.
So absolute minimum you need one persistent branch per old release, if you ever hotfixed it and still have it deployed in the field. GitFlow falls over here, because it only has one master. But at least it does recognize the fact that repairing released code is different from pushing the unreleased state of the art forward.
I can count on zero hands the number of times I've needed to solve a problem by navigating the branch tree.
I've lost count of the number of times that two eternal branches and feature branches with pull requests (+ code review) has saved major flaws from getting to production.
The develop branch is perfect for automatically deploying our bleeding edge to our test server.
Although, if we move to a more continuous deployment approach, we may transition away from two eternal branches. But when GitFlow was first written about, continuous deployment really wasn't the trend that it is now.
I never used gitflow so I could be wrong but the main problem with logging seems to be this:
Gitflow thinks about branches as lanes. Git branches are actually labels. What's the difference? In the gitflow model every commit belongs (implicitly) to a branch (or a lane). Git branches don't work that way. One could actually implement "lane" as an additional commit metadata and tweak git-log (and other git utilities) to always show lanes in straight lines in the graph.
On the main project I'm working on, the reason we have develop/master is mainly for hotfixes.
We deploy once a week, but if we need to get something out the door quickly, we make a hotfix branch off of master, then merge it into both develop and master. This way, if we find something that needs to be fixed before the next release, but don't want to push half-done updates, we can seamlessly do it.
One thing that bothers me about GitFlow is that it mangles history with merges. Sometimes it becomes tricky to debug issues when history was created with GitFlow.
I would rather branch off of master, bring changes in via git am or rebasing when ready, then tag a release when it is ready to be released. If there is something wrong with master, the tagged releases serve as easy points to branch off of.
I think it depends on the project's structure and team discipline. I tend to prefer straight lines in the history where a single feature or a group of similar functions are linear.
I think it greatly depends of the size of your team: if you're alone you have one branch, if you're two you may have 3 branches, each for one of you + master, if you're four you start to use feature-based branches..
It might be fun to compute the number of branches needed as a function of the number of devs in your team.
Ever since I started getting involved in SCCM stuff, I've been astounded at how much breath and emotion people are eager to waste defending their choice, or technique or strategy or whathaveyou.
SCCM system discussions should be banned on HN, as pointless and heated as vi vs. EMACS discussions.
It's not really harmful just because it's too much overhead for small projects. If you have a huge project I'd assume that it's much harder to read history anyway, and then a more complex pattern of development is reasonable.
Even though I could frequently commit on feature branches, I usually don't. Hence, when I merge feature branches I don't have crazy messy histories that I feel it necessary to rewrite.... Works for me.
What about using merging when it comes to the feature branches but rebase when pulling (git pull --rebase)? Is it that harmful to rewrite the history for your local changes?
You can do it but at GitLab we advise against it if you can avoid it. Many things become harder, for example it is more work to to link merge requests to issues and you can't push a commit to help a person without them giving you access first.
If you are using the Integration-Manager workflow (which GitLab doesn't support as well as Bitbucket or GitHub), all the members of a team have read access to all the repositories and forks in the team namespace. That means the owner of a developer branch fork can always read the repo of another contributor and pull the changes.
Please let me know what you think we should improve to support that workflow better.
Anyway, I think my examples are still valid, it is harder to mention issues and you can't push (write) commits on forks since your have read permissions.
The main assumption in the Integration-Manager workflow is that code from repositories of other users is always pulled by the owner of the current repository as and when appropriate.
So if dev1 and dev2 are working on the same feature in 2 different forks of the main repository, dev1 has to pull commits from dev2 that are needed in his/her fork and dev2 has to do the same in his/her fork. Once the feature is complete, the merge request is created from one of the 2 forks.
Yes it is harder to mention issues, but that can be done in the message of the merge commit. Since forks are essentially equivalent to branches in this workflow, I usually don't mind referring to issues in the individual commits itself which would link to the correct issue on getting merged to the main repository.
We do this with our team's projects hosted on Bitbucket, ymmv.
I personally find git flow to be a wonderfully elegant and simple way of handling a project in git. Not everything is perfect, but I consider git flow to be much like PEP8. It's almost always a good idea to do it the git flow way, unless you have a very specific and documented exception, in which case do that instead.
To me what matters more is the consistency.
Also, the attitude and tone of this article straight up stinks.
The power of git is the ability to work in parallel without getting in each-others way. No longer having a linear history is an acceptable consequence. In reality the history was never linear to begin with. It is better to have a messy but realistic history if you want to trace back what happend and who did and tested what at what time. I prefer my code to be clean and my history to be correct.
I prefer to start with GitHub flow (merging feature branches without rebasing) and described how to deal with different environments and release branches in GitLab Flow https://about.gitlab.com/2014/09/29/gitlab-flow/