Yegge says that novices don't like his second example. But I've been paid to maintain compilers written in Lisp, by people who've hacked in Lisp for decades, and I don't like it either: The "compressed" code example is not awful, but it's not a good example.
An expert should make things clear, even if only to other experts.
Basically, that function is too big, and it uses too many mutable variables. I had to stare at the control flow for 'destructuring' and 'init' for over a minute before I convinced myself that it actually worked correctly. More often than not, a deeply nested Lisp function with mutable variables will contain subtle bugs. So when I see something like this, I need to slow way down and take nothing on trust.
Here are two ultra-compressed parsers that I like better than Yegge's:
To understand these, you'll need some specialized knowledge. But once you understand Yacc (or Parsec), this code is straightforward and clear. The structure of the code mimics the structure of the problem.
OK, but you're criticizing his example from a functional perspective. You want to see something more scheme-like I guess (and frankly I'd agree). But that's not about metadata density, which is the point of his blog post.
The two are related though - I very sparsely comment my code these days, but I cam do this partly because I break my code down into small self-describing functions with meaningful names.
> Basically, that function is too big, and it uses too many mutable variables.
An interesting statement that; particularly the part about mutable variables. Mutability implies state, and state seems to be something we, as programmers, often struggle with. Keeping the those state transitions within our mental model seems to be the hard part.
I've been studying Clojure a lot over the past 12 months; it's approach is quite different from the 9-to-5 Java I write. Rich Hickey's notion of disentangling state into value and identity has been so very helpful to me in dealing with the 'problem' of mutability and state transitions in how I write my own code.
Lisp, whatever the dialect, doesn't strike me as the sort of thing that most Java shops (particularly the large corporate variety) are going to swallow, but the ideas of Clojure... Perhaps I need to get out more, but it's the only language I've seen that proudly embraces a deeply functional paradigm without punting on State (or destroying my brain with monads).
I have no deep love for Java, but once Java 8 arrives with lambdas, all I'll really want is a good implementation of persistent data structures.
PS nothing against monads (or Haskell) - I just haven't gotten there in my Perlis journey.
My second thought was: "no, I just think I disagree with him".
Comments in the code are good. They help the next person that touches that particular part of the project understand why it's coded as is and why it wouldn't be a good idea to code it differently. Commenting a counter increment with "increment the counter" is silly. Commenting it with "it's safe to increment it without taking locks, because the caller already took them" is useful.
Would you say that this code has been written by a two-year-old? Or rather that it's been written by a responsible adult that cared about the integrity of future coders' tonsures and wanted to save them some heavy head-scratching?
Knuth would probably disagree with him also; Yegge's post is almost an anti-literate-programming manifesto, opposing the idea of interspersing your code with English explanations of what it does and why.
Of course, literate programming doesn't advocate writing completely trivial English explanations, so they agree on the counter++ case, but it sounds like Yegge opposes a large ratio of explanatory comments more generally.
I think that Yegge might have been somewhat inspired by a programming style that was widespread in Google at least when I worked there: get as much code into each screenfull as possible.
I've worked with a lot of different programmers over the years. Some are capable of "running" large pieces of code in their head, keeping track of state transitions. Some need a more structured approach so they can concentrate of fewer moving parts at a time. I can to some degree keep lots of code in my head, but I know that the chance of mistakes increases when I do. It is a useful skill to have, but an exhausting and error prone way to work, so I prefer not to.
When I read my own code, I don't really read it as such. I scan it. Often can often look at just the silhouette of a paragraph, a method or even a class to be able to tell you what the code does. I tend to do "one thing per paragraph" -- which when I read code means that I can skip past the code I am not interested in at the moment quickly.
I often notice how other people actually read code, and it makes my skin crawl. I can't watch. It is so SLOW.
Yes, getting more code onto each screenfull is great -- but not having to read the code will always be faster.
Put the comments above the routine, and try to keep the routine statements/expression/code to about one screenful of code (give or take, needs vary).
Make the routine/function so that it indeed does one thing, and you can tell what the inputs and results are. Should you ever need to debug, you can check the data and avoid the internals until you know that this is indeed the broken routine.
Contrast with the mega-routines that one of my former coworkers calls "stir the pot". Those are just broken by (lack of) design, and no commenting strategy can really fix.
This depends a bit on the type of comment. If we are speaking Java or something that uses the method comment to generate documentation, the documentation comment should not concern itself with implementation detail. However, documenting all pre- and post conditions as well as any side-effects is a good thing.
(It is important to know when you are documenting the API and when you are saying something about the underlying implementation -- and what you should say. Again, the goal is to anticipate what will help your users or maintainers that come after you do the right thing).
Occasionally you will need to make some comments regarding the implementation, and I usually make those comments at the start of a code paragraph. Usually to clarify intent when the intent can be misunderstood (to counter "I know what the code does, but not what it tries to accomplish").
Keeping methods short is a good thing, but I'm not dogmatic about it. Though I am allergic to long methods; occasionally there are methods that become even harder to read if you chop them up into lots of small bits. In that case guard-based programming and making use of paragraphs in your code really does help.
I don't like it (PostgreSQL example) because you can't see big picture but must scroll up and down until you understand.
And human tends to put ambiguous comments. so i just don't read it if i don't have to.
Comments that contains examples of input and output, current state of variables can be helpful. or things like that.
So more compressed code is ok, because it has more data displayed in more convinient form. It is same thing for displaying other kinds of data besides code.
But i may also be n00b. so take this with grain of salt.
I find the PostgreSQL example great. The big picture is given to you by the function comment. Scrolling is just a side effect...you can get around it by a good IDE.
yep true, but i meant "big picture" of that particular function, poor choosing of words. but it would be more cool if you somehow would be able from one function see big picture of whole project like in fractals or some thing. :-)
I'm pretty sure that if I were to start contributing to that codebase I'd be really happy that those comments where there. However I'm also pretty sure that as I learned the code I'd be less happy about them and prefer to be able to see more of a function on a single screen. So maybe n00b-hood applies to code bases too.
I also think that some of those functions are too large even without their comments, exec_simple_query is gigantic.
It is also worth noting that comments have less contrast on github and in most other IDEs. I think it speaks to the fact that humans and computers actually care about the same parts of the code file.
I don't think this point can be emphasized enough. When you comment code, your primary focus should be on the "why".
In all but very specific circumstances, the "what" should be fairly obvious by reading the code, with small clues being sufficient to guide the reader.
However, without any comments then there is no way for the reader to understand why a particular approach was chosen. Worse, they may think it needs to be refactored, and then waste time pursuing that only to arrive at the all-too-familiar "oh, that's why they did it that way".
Read the elisp example after "In contrast, here's what my code tends to look like today:"
The elisp example is impossible to skim. No comments, no breaking up of independent operation blocks. I'd say this is terribly commented code.
I'd be annoyed to have to work with it in any quantity: understanding it completely requires reading all of the code, in detail, rather than being able to trust the comments to define the input/output and pre-conditions/post-conditions of the interior blocks of related code.
I'm not a "n00b", and personally, I'd define a "n00b" as someone who doesn't understand the importance of documenting invariants. Said "n00bs" usually have trouble reasoning through all of the otherwise implicit invariants of the code they are writing, and don't understand the importance of doing so.
The PostgreSQL code example above is fantastic. Steve Yegge's code is ugly as sin. PostgreSQL is also one of the most dependable/reliable/well-maintained pieces of software I've used.
Personally, as long as the tests worked, I'd just read the documentation string. (My gut reaction of happiness to see a doc string probably means that I should update my CV. 1/2 :-) )
Yes and that is exactly the point of comments: to make reading code as unnecessary as possible. The base assumption (and reality) should be that every function is working perfectly. People who criticize the existence of too many comments seem to want to read code too often.
I honestly thought the elisp example code looked pretty bad as well. I don't like the flag-style programming (in-for, continue, destructuring) nor do I like the meaningless names (tt, s, init, continue, js2-lb,..). That one function also has quite a lot of responsibilities (looping while you see commas, error handling, handling simple, destructured and in-for vars,..)
There's a middle ground I aspire to. All I want from code is clarity. Not just for me, but for anyone who has to touch the code in the future.
The first is so verbose in comments that you don't see the little code there is. And you couldn't see all the necessary code on the screen.
The second (now) is so dense that you have to read the code carefully and slowly to understand what it's doing.
I have the luxury of time when writing code, I can consider it and lay it out however I want, comment it how I want. To a person fighting a critical bug they lack this luxury. It's my job when I create code to help the person debugging it (even if it's me) to understand the code as quick as possible so that they may find and solve the problem as soon as possible.
Three things I always remember when I code and design software:
1) Debugging is harder than writing code, so if you've been clever in writing code you are by definition not clever enough to debug it.
2) UNIX philosophy... do one thing well.
3) Write both the code and comments in a Hemingway style. Use no obscure language, keep it in layman's terms... but say only what you need to say.
Further to your 3rd point, you can go further and apply Orwell's 6 rules of writing from his essay "Politics and the English Language". The last rule is the most important.
1. Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
2. Never use a long word where a short one will do.
3. If it is possible to cut a word out, always cut it out.
4. Never use the passive where you can use the active.
5. Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
6. Break any of these rules sooner than say anything outright barbarous.
1) Debugging is harder than writing code, so if you've been clever in writing code you are by definition not clever enough to debug it.
Citation needed. And I don't mean proof by appeal to authority because someone famous once said it.
Debugging is different to writing code - you can spend longer thinking about it, you can bring different tools to bear on it, you can focus your attention on smaller or larger pieces of the problem, you can stop thinking about any of the surrounding context which is irrelevant to debugging, instead of having to think about the context relevant for programming.
Is it measurably harder, or is it a less used and less practised skill so not learned as fluently? Is it that programming is seen as "default-success-unless-proved-broken" and debugging is seen as "a-failure-until-fixed", so there's a psychologically different feeling in approaching one than the other? Is it that programming usually happens earlier in a project and debugging later, so there's more stress and deadlines looming? Is it that debugging happens when a problem has been found and now you are acutely aware of what's being held up until the debugging is done?
This quote suggests there is code out there which has known bugs and the authors have tried hard to find them, with enough examples to work from, and never succeeded due specifically to limits of their cleverness. Is this anecdotally true for anyone here?
I don't know about the "not clever enough by definition" part, but I agree that debugging is harder than writing new code.
If you're debugging your own code, you're trying to find something out about it that you didn't understand when you wrote it. You're looking for one of an almost infinite variety of side effects or subtle misunderstandings of interfaces, and you have to find which one it is. There are usually more ways to go wrong than right. You then have to change your brain to add this understanding, which is difficult, while writing code is likely to take advantage of existing mental structures.
When you're debugging someone else's code, you have all the same problems, except you don't have the benefit of knowing what the writer was thinking at the time, that initial understanding.
So yes, you have more time to do it, but that's because it takes longer, and it's more a wrack-your-brains time than writing code.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
(Brian W. Kernighan)
> "To a person fighting a critical bug they lack this luxury. It's my job when I create code to help the person debugging it (even if it's me) to understand the code as quick as possible so that they may find and solve the problem as soon as possible."
I am quite often in this situation - maintaining code written by others on a very tight schedule, weekends, evenings, trying to find and fix bugs and still make it to the next family commitment, etc. Wish more people understood this and considered the many future maintainers when writing their code.
Well, I'm not going to completely dismiss him as a "cranky old dinosaur", but I don't agree with him, and frankly, I find it insulting that he issues a blanket declaration that those with 5-10 years experienced are "crazy teenagers", especially after showing pretty poorly written representative "good" code of his own.
If I, a developer with 6 years experience, am a "crazy teenager", then he reminds me of a "senile old man". He has a few valid points, but overall, I think his message is unhelpful.
I've written the occasional verbose comment before. Why? Well, for context: I was on a 20 person team, working on a poorly designed 750 KLOC codebase for a medical X-ray application. His example of incrementing a counter could have meant the difference between needing to re-shoot (and re-radiate a patient). When we made changes to critical pieces of the code (which were poorly designed before we ever touched it), we had to be exceedingly clear as to why.
While my personal commenting style is more along the lines of his second code example, I see actual code in that function as pretty awful. It's long. It's confusing. The execution path through that code is not immediately clear. Sure, it's dense, but not in a good way.
So, the while the author is free to rant, I think he should take a deep, hard look in the mirror before throwing out blanket criticisms of other developers the way he does.
English-language comments are usually full of prepositions, ambiguous terms and mostly stuff that was meaningful just to the author. Words about data in our out, upstream or downstream are always useless because nobody else has any idea what point of view you're taking (data produced BY your code, or produced FOR your code?)
If you think you're making things clear to somebody coming after you, I'd guess you're wrong most of the time.
And then, the code changes but the comment doesn't. Now its downright misleading.
Also it's arrogant to imagine somebody who writes good code can also write good prose. Testing shows most folks are good at one or the other, right? Math or English, few can do both.
Having essentially the same code written in two places (code and comment) absolutely BEGS for them to get out of sync. Its why we write modules, subroutines etc - get it right once.
Comments are evil if they mention the code. They should be about constraints, the compiler, your wife, ANYTHING but the code.
I didn't say the comment had to be describing the code in terms of what it was doing, but it absolutely should describe why something was done. If that "why" changes, then the comments should be updated.
This isn't double documentation -- it's complementary documentation. Code is the what, comments are the why. If one does not match the other, then again, that's on the programmer.
Unless they are an idiot savant, anyone who can do math very well can learn to use English (or whatever their native language is) competently. Anyone who says otherwise is making excuses, usually for laziness.
Here's an example from our colleagues at Google (WebRTC):
//how many frames should have been received since the last
// update if too many have been dropped or there have been
// big delays won't allow this reduction may no longer need
// the send_ts_diff here
num_pkts_expected = (int)(((float)(arr_ts -
bwest_str->last_update_ts) * 1000.0f /(float) FS) /
(float)frame_length);
Once meant something to somebody. Now its gibberish. Even the variable mentioned is no longer in the code.
This seems a bit related to some discussions I've been having lately. Bear with me.
Really good programmers act as multipliers of other people's productivity. If you can build on code that I write and my code makes you more productive, then that makes my work worth a lot more. If you have to work hard to understand my code, I have failed.
But in order for that to happen, not only does my code have to be good enough or do something useful: you have to be able to understand how to make correct use of it. Which means that I have to be able to communicate this to you.
This means that it is my job to express myself in a way that maximizes the probability that you will understand how to correctly leverage my code. This is an incredibly hard thing to do.
So no, I don't think it is good enough to demand that other people wisen up and develop a tolerance for terseness. This is a somewhat self-centered attitude where the programmer sees himself or herself as the most important. Which is great if you want to do solo projects -- but not useful if you are to allow other programmers to use your talent as an amplifier for their own work and ideas.
(Of course, I agree with Steve that comments that read like an entire narrative are excessive. But most programmers fail to document intent and non-obvious quirks. And also, adding vertical whitespace is useful to delineate "paragraphs" of code so one doesn't have to actually read the code, but one can scan the code vertically)
Should a team write for the least common denominator? And if so, exactly how compressed should they make the code? I think the question may be unanswerable.
I found this point quite interesting. It clearly matters whether the question has an answer and I suspect there's a way to determine that.
If we buy into the whole "good programmers are 10x more productive than average programmers", then we can build a mathematical model of team productivity. We have X number of great programmers, Y number of average programmers, and Z number of bad programmers, each of whose productivities are mathematically (in this case, linearly) related. At this point the question becomes an operations research problem which ought to have a clear answer, which might be different depending on your team's configuration. (Question: Are there published, peer-reviewed papers measuring programmer productivity in this manner?)
Suppose we don't believe we can actually reasonably compare one person's productivity to another. It then would seem reasonable to try to avoid the question altogether. The issue arises from different skill levels on a particular team, so if you could somehow guarantee people will have the same skill level (at least as it pertains to the specific project) then you no longer need to worry about how to write your code. I think this is what things like pair programming are trying to address - creating a team that is somehow aware of their average skill level and writes code in the appropriate style.
This would also explain the general frustration people have with pair programming - if you haven't internalized the team's style, you're likely to either be either way over your head or having to slow yourself down and teach instead of getting as much done as possible.
The narrative commenting style was particularly recognizable. I started out in Java ~8 years ago and only recently moved to Python. Whenever I used to write a particular algorithm in Java I would write comments in the narrative style described by the article. However, I noticed myself deleting large chunks of those comments as I revisited the code at a later point in time. Comment refactoring, so to speak. These days, in Python, I rarely write narratives, and when I do, I delete or shorten the narrative before I commit. Moral of the story: narratives can be useful to organize your brain, even with years of experience, but they should really be gone when the code (and your thought process) is finished.
> If it's a complicated algorithm, a veteran programmer wants to see the whole thing on the screen, which means reducing the number of blank lines and inline comments – especially comments that simply reiterate what the code is doing.
Don't text editors have, along with syntax highlighting, any "hide comments" and "compress whitespace especially blank lines" features?
Possibly, but those are actually kind of dangerous, because if you have the comments hidden and you are editing the code you probably aren't updating the comments. And stale comments are worse than no comments.
Absolutely true. But your assumption that people being able to read comments means that they'll update them seems awfully naive. Comments become lies over time. They always do.
I didn't mean to imply that and I completely agree with you. I'm not a huge commenter myself--my general rule is only comment what is weird or un-obvious and make the code readable with good function and variable names.
I had trouble getting to the conclusions that Steve has drawn here, mostly because I found the article all over the map. In the postscript, he talks about how difficult it was to write it and he actually cut stuff out.
I think his intended message would be greatly improved with greater focus. I think he was attempting to draw from disparate things to build a picture of programmers at various stages of growth, but his conclusions read more like a list of general best practices. It was not what I expected as conclusions to the article.
I would like to see a statically typed language with compile time checking of comments. If a comment references a function and the signature later changes, that comment should throw an error at compile time. It would also be nice if a comment, or parts of a comment, could be tethered to pieces of code, so that any modifications to that code will produce a warning indicating you should potentially update the comment.
I am a bit suprised to not see TDD or any of its siblings being pulled into the discussion.
To me, the better someone is doing TD, the better the code communicates because TDD makes you think about the right things when coding. It sets your priorities straight.
So the question of n00b or vet for me is just how well someone does TDD. And there is also this great book by Kent Beck about Implementation Patterns that is all about well-communication code.
If we don't, your comment doesn't help, and therefore is nothing more than boasting about how you "get it".
So what is it about? Static typing? Acceptance of change in all things? Acceptance that code will always have flaws? Aging? Lisp code-is-data advocacy? How the difference between beginners and experienced people is as much slogging through grunt work as it is skill? How pragmatism wins over either strongly-for or strongly-against approaches?
A reference to Joyce, as younger Steve Yegge "begins to question and rebel against the Java and Ruby conventions with which he has been raised."? What?
The article is about being a pragmatic programmer. Know when to stop modeling, stop writing endless abstractions, and start writing code that does something. Don't model for the sake of modeling.
For me personally, stopping to comment on code breaks my concentration on coding. So I usually only do it after I've finished some task, and only at the function level. I don't see a point in explaining why I chose to use a while loop instead of a for loop. The next coder can change it if he/she dislikes the implementation. I would only have denser commenting on some code that I spent a lot of time optimizing for speed.
I also have found in full-time and contracting work that your time estimates, or more precisely what the business/client desires your time to be, don't account for commenting, which for me personally can take quite a lot of time if done correctly. If done correctly a comment can convince me to change the code; if you can't easily explain what a function does then it might not need to exist.
I find myself writing narratives, I think part of my own problem is the desire to justify and explain why I'm doing something the way I'm doing it, so at least if I ever have to come back and work with code I wrote a year earlier I'll know what the hell I was thinking when writing it.
I still do that when I'm writing "hard" code. There, the process of exploring the problem space is happening as I'm typing the code, and putting it down in English helps me keep it all in my head while I translate it to code. But I find that once done, most of that stuff gets done it gets deleted or trimmed substantially. It tends to be embarassing to find it checked in later on.
I programmed in FORTRAN a lot in the 1970s and Lisp in the 1980s and I had the habit of starting to write new code by just writing entry points and then narrative comments like the ones in Steve's first example (but not so verbose). I would wait a while, do something else, then come back to my outline, and if the narrative outline still made sense, then I would add the code. (I was the only computer programmer in my large company with a private ocean view office, so the technique must have worked :-)
I stopped using that coding technique eventually, and Steve's article gives me a good perspective of that transition.
The problem with his 'noob' example is not only that the comments are overly verbose, but also that his variable names are generic and totally useless. 'counter' and 'pos' or 'ref' are so generic that you always have to look at the whole code before you know what's going on. Rename the 'counter' variable to 'bytesProcessed', and rename the 'pos' variable to 'startOfBuffer', and use a variable named 'currentByte' to walk through the buffer... With descriptive variable names many comments become unnecessary.
Any day of the week I'll take reading a comment for a few seconds to trying to get inside the mind of the original coder which can take minutes or hours or days.
I didnt really pay much attention to the points regarding commenting style because it seemed like the real point here was the direction java was going in the late 2000's (or more importantly the ideas being sold). In this context, this posting comes together as very insightful and actually resonates with a lot of truth.
I disagree with Steve's comparison of over-commenting with static typing. The key difference, I think, is that comments are purely for the human, whereas static typing also means something to the compiler. It allows the programmer to offload some work by letting the compiler check it, which comments do not do.
In what way was it irrelevant? I think he would have been able to get his point across a lot more efficiently if he had made it shorter. That also happened to be the point of his article (but in a different context).
The only rule I have for code style is to write it like a English paper, i.e. to make it easier to read and understand. Yes, I do put blank lines here to there, just to group things together.
All the very best and most experienced programmers (wink) will know ways to write code that both amateurs and elites can enjoy. I personally (wink) do things like extract expressions to give them meaningful identifiers, include units in parameter names, and sprinkle short one-liners throughout to summarize blocks of code.
An expert should make things clear, even if only to other experts.
Basically, that function is too big, and it uses too many mutable variables. I had to stare at the control flow for 'destructuring' and 'init' for over a minute before I convinced myself that it actually worked correctly. More often than not, a deeply nested Lisp function with mutable variables will contain subtle bugs. So when I see something like this, I need to slow way down and take nothing on trust.
Here are two ultra-compressed parsers that I like better than Yegge's:
https://gist.github.com/1068185 and http://book.realworldhaskell.org/read/using-parsec.html
To understand these, you'll need some specialized knowledge. But once you understand Yacc (or Parsec), this code is straightforward and clear. The structure of the code mimics the structure of the problem.