> Should we strive to satisfy the Shakespeare for Dummies demographic?
A thousand times, yes!
There is code out there so elegant and pure that, after painstakingly pulling apart its pieces and analyzing each choice of word like a Robert Frost poem, we're left feeling like we've seen a face of God.
When debugging in the middle of the night because an alarm went off or a customer is angry, T. S. Elliot is absolutely the last thing I want to see. When I want to maintain a large system quickly, I want to be treated like I'm 5. I don't want Shakespeare. I don't even want Shakespeare for Dummies. I want "Dick and Jane Process a Customer Order."
I absolutely agree with the author that when I say "This code is unreadable," I actually mean "I haven’t spent enough time trying to read," but time is something I don't have. My coworkers and I will likely read the same code many times. We only have so much time to spend reading code. Code that takes a lot of time to read is bad. We do not have time to keep studying the code until it is no longer unreadable.
We are more than willing to invest quite a lot of additional time writing the code to make the time it takes to read the code as low as conceivably possible. Heck, with a small handful of exceptions, I will happily trade quite a bit of performance for simplicity.
Sure, if you're Terje Mathisen, and you're writing Quake 3, and Polyhymnia or Velvel Kahan descend from Mt. Helicon singing of inverse square roots and the number 0x5f3759df, and you happen to be trying to optimize a 3D game engine, sure, write poetry. Ages and ages hence, we shall read of your algorithm by the fireside and slowly smile as a blog explains to us why it is brilliant. Sing on! But if you're writing a normal program that's gonna be maintained frequently by normal people, you call security. That muse does not have a badge. She does not work here. There is a clear "no children of Mnemosyne" sign in the lobby.
But, in the general case, who is this lowest common denominator programmer you are writing for? I can probably work out a few lines of terse Haskell more quickly than a page or two of boilerplate Java, because that's what I'm familiar with. Readable code is about pattern recognition, and everybody has a different set of patterns.
"Shakespeare for Dummies" works because English speakers tend to have a shared cultural familiarity that you can write for, but programmers typically don't.
That's not to say style isn't important. I'd say consistency is more important than any notion of simplicity, because once the reader has recognised a pattern they can re-recognise it quickly.
This. The problem with the Shakespeare argument is that authors compares code to artistic writing. But code should be compared to technical writing. When you write technical analysis or documentation, you are expected to create easily understood document that is tailored for the reader. Even if you really enjoy Joyce, you usually do not do your technical writing in Finnegans Wake style. That would be a terrible practice.
It is similar with code. People are writing obscure code for various reasons (trying to break the language, code golfing, etc.), but when you write code for a project that is to be maintained in the future, you should use the "formal" style of coding. This also means using the best practices and patterns that other people are accustomed to so they can be more effective.
Yeah, we shouldn't compare code to literary text or poetry, since these texts are not malleable in the way we want code to be.
For example Shakespeare often uses obsolete words or words which have changed meaning - but we can't replace these with more modern words or phases, since then it would break the meter or rhyme and destroy some of the double meanings. This is not a quality we want for code. We want code to be maintainable.
Perhaps this is the reason why pure FP is just as obscure as it is? Sure, it's technically more beautiful and pure and stuff, like a piece of art -- but a bunch of Java is easier to figure out, because it's plain as a cloth sack.
There's no reason "pure FP" should be fundamentally any more inscrutable than Java code; that assumption may just be a personal bias due to familiarity. It's definitely possible to go the other way too, with an object-oriented mess where a judicious application of functional principles would make for much clearer code.
Have you ever encountered code that's easy to change but needs to be changed very often, is that strictly better than code that is hard to change but very unlikely to ever change?
A more concrete example, fixing null pointer exceptions is often easy but code that avoids them using abstractions like maybes and sum types is harder to change but requires less maintenance overall.
Lessening cognitive load while increasing easy manual work is not always better
Well that code has already provided its value over 4 long years without bothering anyone, it doesn't hurt if someone has to struggle a bit understanding how it works now.
Have you read dick and Jane? I have, 150 pages to say "Sally is upset because her toy sailboat got left outside in the rain, and she wants dick to go get it". When I'm looking at a 3am level bug I don't want to wade through that mess, and odds are that the more complex rendering I gave. The complex rendering might have even "used the word storm above" and averted the entire 3am crisis.
Dick and Jane has a place for my 5 year old. But we expect 5 year olds to grow up to better.
But we shouldn't where it's important (implying that code is important). Don't put the "open door" button next to the "set building on fire" button, just because all employees should know which button to press to open the door.
It reminds me of Pointing and Calling [1]. Of course you expect everybody to be able to do things without pointing and calling, but it improves safety and reduces errors, and that's more important than "but people should be able to do it without silly gestures", isn't it?
I’m sure I recall seeing a graph showing the development of a programmer over time, showing how their code starts out as simple but scarcely readable, gradually gets better but then descends for a period into excessive abstraction and unreadability as they learn new and exciting aspects of the language and can’t resist the temptation to show-off this abstruse knowledge. Before climbing back into simple elegance and understandability as their self-confidence and assurance in the language gradually subsumes the need to be ‘clever’ and gnostic in their coding.
I’m amused the author chose Shakespeare as an example of ‘difficult’ writing. Shakespeare is eminently readable at all levels: his skill was precisely that his compelling stories can be read for their exciting narrative by anyone, and equally can be mined for deeper meaning by those that are interested in unraveling his layers of metaphor and symbolism.
Shakespeare is difficult to read because many words have changed meaning over time. So unless you are very careful when reading, you will think you understand, but actually misunderstand a lot.
Some of the puns does not even make sense anymore because the pronunciation have changed. You basically have to be a specialized linguist to understand them all.
Code readability is just one of the goals we aim for while designing software, performance is another, stability is another, flexibility and ability to make changes is yet another.
While reading any code I tend to think about these tradeoffs and see if that hard to read code is actually hard for a reason or simply a mess because no one took care of it.
Writing dumb easy looking duplicate code is not optimal either, I don't see an issue with stable, fast and efficient code that's harder to understand yet works brilliantly. One can easily increase readability of such code by adding comments but one can't get performance, stability and flexibility by writing comments.
Optimising for dummies just because code is easier to debug isn't a good idea, often times I have to debug easy looking code only to find that it's not that easy in execution as it was in reading. Simple code often doesn't handle all the edge cases, abstraction less code is harder to maintain and change, easy code can't evolve and grow with business needs.
It's only when code is unreadable, buggy and slow that it starts to become a real mess.
Also a shout out to Rich Hickey's distinction between easy and simple, creating simple code requires a lot more effort and thought and is not easy to just read and modify by dummies.
I think a lot of the time people think they have performance issues when they in fact don't, likewise with stability and flexibility.
In most cases, writing the readable and straightforward version first and only moving to the less readable but more X version (for any given value of X) after it's evident that you need to is the optimal solution.
It's the programming equivalent to buying cheap tools first and only buying the expensive version once the cheap one breaks: If it breaks you used it enough to warrant the expensive and more durable one, and if it didn't break you didn't have to spend more money than necessary.
I do see your point but let me say a bit about performance, performance matters a whole lot if your software is being used seriously all day long for getting things done in the real world.
If your software is going to be someone's day job it can't afford to be slow because users would take note and complain loud and clear because it's wasting their time. Just imagine git taking 15mins for showing diffs, would you accept that for an improvement in code readability and straightforwardness.
So for different industries there are different X's that matter a whole lot more than mere readability and developer convenience.
I absolutely agree that performance matters a whole lot, it's one of the most important considerations in software.
It's also true that many developers are not very good at spotting where and how to improve performance, and what trade-offs are appropriate, and that's what I meant.
Write the straightforward version first that's easy to read so that when you need to improve performance, it's easy to use a profiler to go in and rewrite the non-performant parts in a less readable but faster way.
I’m missing a bit of prescriptiveness here. Here’s a formula I’ve arrived at after 15y of academic and professional programming - sharing here with the purpose of having it challenged by the reader:
* every abstraction has a cost - DO a quick cost/benefit analysis in your head, or even better in the code comments, when choosing for/against an abstraction.
* generally, start with KISS.
- a process begins with a verb and its design usually starts with being a function/procedure
- pieces of data that belong together across multiple processes should start out as an immutable object or struct. In python I’m almost never using direct descendants of object, rather I inherit either from NamedTuple (for final classes) or I decorate with @dataclass(frozen=True) for non-final.
* using the above, you’ll start seeing violations of DRY. DO use the rule-of-threes. If there’s a second repetition you’re either writing, or already anticipate based on the JIRA backlog - factor out the repeating code using an appropriate abstraction.
I’ve found that code written with these rules tends to be simple yet clean (these two properties can often be opposite, I’ve seen simple code that is constantly repeating itself and thus becomes unmaintainable, and I’ve also seen clean code that is the opposite of simple because of an abundance of abstractions (think FizzBuzzEnterprise).
Yes, abstractions make symbolic sense for humans, but in a technical context they are most often problematic, because intuition and technical correctness rarely align. Not like they are on opposite sides, but intuition almost always has a lot of technical holes, and when you are coding, something always ends up surfacing from those holes... and it's never pleasant. In this sense, focusing on the data and simple rules is most likely to bring us closer to the true technical problem that we are trying to solve. Abstractions and organization and good names are still vital because we do need an intuitive vision of the system as a whole, but the technical processes need to be strictly correct, not just intuitive.
> When we observe this tendency in other (non-programming) contexts we may interpret it as laziness or short attention span. When we react this way to code we blame the code and the original programmer.
Strongly agreed. Particularly about a lot of examples of the so-called "clever code".
Programming is a profession. You can't expect to forever coast on what little knowledge landed you your first job. You're supposed to learn and improve over time. That means learning to understand more complex code, architectures and programming paradigms.
I can't read Haskell code at all. But I don't claim it's "clever" or "unreadable". I realize the code is probably fine, it's me who needs to pick up a book and learn the language.
There's rampant anti-intellectualism and (the bad kind of) laziness in programmer circles, thinly veiled as concern for efficiency ("less clever code = easier to debug", or "helps juniors contribute", as if the job of a junior wasn't - in big part - to be learning to become a senior); I find it just being penny wise, pound foolish. Lambda expressions[0], regular expressions, pattern matching, Lisp macros - this is not clever code, these are tools that can greatly reduce complexity and improve readability. They just require spending a few hours or days with a book, every now and then.
--
[0] - Yes, really; few jobs back, just after we transitioned to Java 8, I was told by my boss to maybe refrain from using lambda expressions for the sake of "more junior" people. Right, because polluting code with anonymous classes is easier on juniors than (event) -> { few lines; } in some GUI event handler.
The problem of the article is the same as the problem of the general discussion on how to program better: quality of arguments. Instead of conducting scientific studies on code readability, programmers keep on sharing their opinions which they support with anecdotes or citations of Shakespeare. There is so much for the community to learn about...
There is a recent study (to be published at ESEM'20) on an empirical validation of "Cognitive Complexity" as a measure of source code understandability: https://arxiv.org/abs/2007.12520
The authors state in the conclusion: "The metric correlates with the time it takes a developer to understand source code, with a combination of time and correctness, and with subjective ratings of understandability."
One requires to do 9 arithmetic operations in your head, perhaps more than once if you lost count, the other one doesn't.
You can compare code in terms of length, nesting, cyclomatic complexity, number of negations, number of inputs, etc. It's not just subjective bikeshedding.
1 is simpler in most cases because I trust the computer to get the answer, but I need to know all the explicit parts. 2 hides where it all came from for an answer which might be wrong.
Usually you give each constant a name and then operate on them symbolically.
If it's a simple enough operation you can skip that and just add a comment.
In this example, there are 9 arithmetic operations and there are clearly more readable ways of doing this. This is the point I am trying to make.
For the author of this article, using roman numerals and using arabic numerals are the same because they represent the same quantities. However, in practice, if you have 2 accountants and tell one of them to use roman numerals, the one using roman numerals will be less productive, make more mistakes and will likely be frustrated.
I think the point is that we develop in a stream of consciousness. We figure out what we need, pull it in, do what we need to do as it pops into our thoughts. When we're done it's time to start writing. Take that stream of consciousness and make it more logical. Easier to follow. Refined.
That's how authors write books. No one writes a first draft and declared the book finished. Every creative effort requires iteration to improve.
there is this quote from Clean Code by uncle Bob that I simply love.
'Avoid mental mapping: In general programmers are pretty smart people. Smart people sometimes like to show off their smarts by demonstrating their mental juggling abilities. After all, if you can reliably remember that r is the lower-cased version of the url with the host and scheme removed, then you must clearly be very smart.'
>By analogy, plenty of people find reading Homer, Shakespeare, or Nabokov difficult and challenging, but we don’t say “Macbeth is unreadable.” We understand that the problem lies with the reader.
Why does the responsibility Have to be solely on the reader? There's plenty of code out there that's unreadable because of the coder, how is this outside of the realm of possibility? Why is all the onus and bias on the reader?
For example, readable:
measurementOfLeftBottomSideOfBox = 200;
versus unreadable (an acronym of Left Bottom Side):
LBB = 200;
Just like English, programming languages rely on the talent writer and on the abilities of the reader At the same Time.
The best code is code written by a talented programmer who can make the code readable to All people of All skill levels.
One thing people get confused about is readability and elegance. Obviously "measurementOfLeftBottomSideOfBox" is readable but not elegant. While "LBB" is certainly elegant but not readable. My philosophy is readability over elegance, but you will find many programmers are unaware of this dichotomy and have a strict subconscious aversion to writing something ugly like "measurementOfLeftBottomSideOfBox."
This aversion leads to more unreadability than necessary. It's some subconscious thing in our minds that makes us code this way but when you think about it.... there's no logical point in it at all. Aim to encode as much context as possible into your code because it's completely irrelevant how ugly the variable appears.
> The best code is code written by a talented programmer who can make the code readable to All people of All skill levels.
Unfortunately this is not true. Code is like language. You can make it readable for everyone by using the most easy to comprehend and least ambiguous language. But doing so decreases efficiency for both "advanced" writers and readers. "Advanced" grammar and vocabulary exists for a reason - it allows to express thoughts more succinct and concise and can decrease the time to understand something in great level of detail and context by magnitudes. But it requires knowledge and shared context between writer and reader.
It's the same for using a very efficient compression algorithm vs. plaintext. You cannot have the advantages of both at the same time. Pick your poison.
(I assume you understand "pick your poison", a non-native reader might not. It's so nice and concise, isn't it? :)
As another example, consider the simple wikipedia[1]. It undeniably serves a very important role. Would it be appropriate for it to replace the standard english wikipedia?
Or, would it be appropriate for all scientific literature to be written in the style of simple wikipedia?
Simplicity and accessibility are undeniably important. But they shouldn't be your only goals.
Unfortunately for your argument, it actually is completely true.
You're thinking about it the wrong way, an analogy to english doesn't prove your point when the analogy is irrelevant.
Sure there's advanced vocabulary in English. But within that domain the english is still understandable. If the reader understands the domain he understands what is written and does not need to decipher or decode what is written. The domain expert just reads it and gets it, no deciphering needed.
This is not what happens with domain specific code and this is not how domain experts read code therefore your analogy does not apply.
Reading code tends to be very very different from reading english and much much slower. Reading is actually an inaccurate term. The reader for code spends much of his time deciphering code and any name that helps elucidate context and eliminate deciphering is a plus. This occurs EVEN for domain specific code.
What happens with domain specific code is that a programmer tends to make up abbreviations on the fly and ends up writing something that is not readable at first glance and the reader needs to decipher the code. For example let's say I'm a domain expert in robotics and I want to encode positioning of the robot. For elegance I use this:
xPosRelB = 23
which is short for x position relative to base.
versus
meters_west_from_base = 23
I can assure you 95% of programmers write the former rather than the latter and it's definitely not for efficiency gains. You might lose like 1 nanosecond of efficiency reading the latter but this only applies if you already know the meaning of xPosRelB, seriously if every single variable was written in the same way as the former you LOSE efficiency from trying to decipher meaning from context.
The end result is that in order to figure out what xPosRelB is the reader always has to sort of dig a bit at the context. He has to see how it's used, where that variable comes from or in other words he needs to "decipher" it. This is super common in programming but not very common when reading English. Again your analogy does not apply. In short the second name in the example is just read and understood and is by far the better choice.
When you ask the average programmer why he wrote xPosRelB rather than meters_west_from_base, he'll tell you that meters_from_base is too long and too ugly. Programmers bitch and moan about stuff like this that doesn't even matter. I had one guy tell me that you shouldn't mix and match camel_case or snakeCase because it just looks bad (there's a real reason why people don't mix it, and it's not aesthetics).
If I go meta and bring this topic up and ask the programmer why again.... then he brings up reading efficiency, exactly what you're doing here.
Readability and structure is what's important not aesthetics. What matters is that someone can read your code rather than decipher it.
Length and prettiness contribute nothing to readability and barely dents reading efficiency.
Domain targeted programming is stuff like this:
gallonsOfCompoundV = 34
Compound V is the domain. There's no need to explain what compound V is in the variable name. This is not a big problem with coding for readability. The big problem and the problem I am addressing is this:
compVg = 34
Seriously. Someone once complained to me about the word "Of" in my variable name. My bad, I'm sorry that added 2 nanoseconds to your reading time with the word "of".
Of course the team may have conventions. For example my team prefixes the letter k to all constants. This stuff is fine and doesn't harm readability, but this is not what I'm talking about.
I can just as well argue that long rambling variable names are unreadable, because they obscure the macro-level structure of the code and the continuous repetition creates extra cognitive load ("is this really the same variable as the other one? It looks like the first three words are the same, but...") especially when one tries to keep track of several of these huge names and follow the dataflow.
Also, left and bottom together refers to a point, not a side; so I would be doubly perplexed upon encountering such a name.
>I can just as well argue that long rambling variable names are unreadable, because they obscure the macro-level structure of the code and the continuous repetition creates extra cognitive load ("is this really the same variable as the other one? It looks like the first three words are the same, but...") especially when one tries to keep track of several of these huge names and follow the dataflow.
This problem you describe occurs in the english language. In formal documentation of code written in English tends to be by far more verbose in their explanation of code than the code itself.
Yet we don't complain about english? Why?
What if I take all the crazy shortcuts we use in programming names and use that in documentation and daily communication. Would it make my communication seem less like rambling and seem more clear? Would it help lessen the obscurity of the macro level structure of my point? Will removing the continuous use of grammatical repetition with words like "the" or "and" lessen the large cognitive load on your brain so you can understand my english?
No it likely won't. In fact it will make me LESS understandable.
The dichotomy here illustrates a bias within human nature. For some reason people perceive a certain level of verbosity in code to be bad but not so in english. Given the fact that even among programmers English is much more readable than code I would say that there is definitely a huge unrealized mass delusion going on here.
Have you heard of literate programming by donald knuth? It's literally about taking every line of code and upping the verbosity by 10x by replacing it with a macro of an english paragraph. Literally an extreme version of the verbose variable naming I'm describing.
If you can see validity in literate programming than basically my variable naming is a tiny tiny step in that direction.
I'm pretty sure now that you are a native English speaker. I am not and I do complain. But if one only knows one language then I can only quote Lutwid Wittgenstein: "The limits of my language mean the limits of my world". (he refers to thoughts)
> If you can see validity in literate programming than basically my variable naming is a tiny tiny step in that direction.
No need for that, because history has shown that people like English keywords, but that's about it. Just look at SQL - many people complain about the verbose syntax, which comes from trying to make it read like English "for business people" (who don't use it anyways).
Check discussion here: https://news.ycombinator.com/item?id=24730713
But SQL is not a general purpose programming language - let's look at one of these: COBOL. It is literally what you want: programs look like text, which comes from trying to make it read like English "for business people" (who don't use it anyways). And guess what? No one who actually uses it really likes the syntax.
Maybe you should start with Mathematicians and ask them to stop naming their variables "x" or "f" and use English instead of symbols. How about that? :)
>I can only quote Lutwid Wittgenstein: "The limits of my language mean the limits of my world"
Except I don't see how a quotation proves anything or how this is relevant. First off is Lutwid correct? Can you think of things outside of language? Can you think in pictures? If so than I would say he's wrong.
Second off, what does this have to do with bad naming in a programming language? Nothing.
>No need for that, because history has shown that people like English keywords, but that's about it.
I'm not arguing for english or english keywords. I'm arguing for phrasing in naming things, this is exactly what literate programming is arguing for. I may have used the word "English" because HN is an english forum but you can easily replace that with "Any relevant language"
The article is talking about composability not readability which is a separate problem. SQL is in fact quite a readable language. Many people can read a query and instantly derive meaning. Either way I am not talking about programming language syntax. I am talking about naming within the bounds of the syntax.
>let's look at one of these: COBOL. It is literally what you want: programs look like text, which comes from trying to make it read like English "for business people" (who don't use it anyways). And guess what? No one who actually uses it really likes the syntax.
It's not about whether or not the writer likes the syntax. Clearly program writers have a bias for elegance over readability. It's about whether someone can read your program and completely understand it in one pass. Clearly COBOL was first off, not guaranteed to be readable and second off had many many other issues outside of readability. Readability was definitely not the main issue with COBOL.
>Maybe you should start with Mathematicians and ask them to stop naming their variables "x" or "f" and use English instead of symbols. How about that? :)
All mathematicians are expected to write paragraphs in "English" or some other relevant language outside of equations. Many math papers and math books have more written english than equations.
Whenever a mathematician uses the letter "x" or "f" they define the meaning with an english phrase or paragraph. This is expected for clarity in math and in english but for some odd reason not expected for programmers. The mathematician can only get away with an "x" or an "f" because he has an accompanying english explanation defining those variables right next to the equations.
Programmers are like mathematicians writing conceptual papers using only equations and math symbols without the english explanations. We have the luxury to insert some english into the naming of primitives so we should take advantage of that lest all our programs devolve into primitives with single letter names.
Sure we can comment our code. But people tend not to do that to the degree that it's done to explain things in a mathematical research paper.
> It's about whether someone can read your program and completely understand it in one pass
That's not the only thing. It's also how fast they can read/understand the program code.
> All mathematicians are expected to write paragraphs in "English" or some other relevant language outside of equations.
Sure, we do that for software also - we document things outside of the code. For example in a wiki, a README.md or sometimes even just above the code as comment/docs. But that's not what I was talking about, right?
> Whenever a mathematician uses the letter "x" or "f" they define the meaning with an english phrase or paragraph.
So you are fine if a developer does e.g. `c := (some super complex expression) // find relevant customer` and then goes on to use just `c` everywhere in their code instead of `relevantCustomerBlaBlaBla`?
>That's not the only thing. It's also how fast they can read/understand the program code.
What are you doing? Speed reading? Even 2xing the reading time with verbosity is not a factor if lack of verbosity forces the reader to switch context multiple times in order to elucidate meaning.
>Sure, we do that for software also - we document things outside of the code. For example in a wiki, a README.md or sometimes even just above the code as comment/docs. But that's not what I was talking about, right?
Major difference. In math it's required. In programming it's recommended and rarely done. When it is done, it's rarely done in a form that's as complete or as detailed as math. YOu get an outline at best. Programmers are expected to read code without documentation because program writers are too lazy to write documentation.
SO what I'm saying is put your documentation into the code itself. Self documenting code with readable names as opposed to elegant.
>So you are fine if a developer does e.g. `c := (some super complex expression) // find relevant customer` and then goes on to use just `c` everywhere in their code instead of `relevantCustomerBlaBlaBla`?
relevantCustomerMetric is way better than c everywhere. Autocomplete should alleviate the pain of typing. Not every programmer arrives at the variable where it was incepted so they can't see the comment.
> The dichotomy here illustrates a bias within human nature. For some reason people perceive a certain level of verbosity in code to be bad but not so in english. Given the fact that even among programmers English is much more readable than code I would say that there is definitely a huge unrealized mass delusion going on here.
People definitely perceive overly verbose English as bad. "Legalese" is essentially a pejorative for verbose language, and advice like "never use a long word where a short one will do" is common.
No you misunderstood me. I'm saying the normal level of verbosity in english when applied to code is considered to be over verbose. I'm not even talking about areas where there is domain expertise. There's no need to spell out the meaning of the word decoherence in quantum mechanics for the language of english or python when communicating with domain experts. But in python someone tends to use the words
decoAB = True
over
decoherenceOfTwoParticlesAandB = True
but in English they use.
The decoherence of two particles, particle A and particle B.
and not.
The deco of A B.
It's opposite logic for what is essentially an attempt to communicate the same concept.
Two languages where different levels of verbosity are considered to be normal. For programming be less explicit, use shorter names that are prettier but less clear, for English be absolutely clear, use correct grammar.... Why? Why do we feel programming languages need to less verbose and the english language needs to have a higher level of verbosity to be normal?
Regardless of the why behind our biases, what is clear is that it is a bias because two conflicting levels of verbosity indicates irrationality.
When you analyze it logically, it's by far easier to read english than it is to read a programming language even by a programming expert. Therefore the verbosity level of english must be the level of verbosity that is baseline for humans to have the most clarity in a single read.
This means that in general variable names are not verbose enough in programming.
> My philosophy is readability over elegance, but you will find many programmers are unaware of this dichotomy and have a strict subconscious aversion to writing something ugly like "measurementOfLeftBottomSideOfBox."
I strongly believe that naming like "measurementOfLeftBottomSideOfBox" is not that helpful or readable.
A name like that implies that there are measurements for each side of this box, so following that naming scheme we would have at least:
Look at how many much useless text we have here. "measurementOf" and "SideOfBox" add nothing but clutter to the naming, and writing out practically the same thing 4 times suggests we could abstract this into a data structure.
I know I'm being overly pedantic in this case, but I think the sentiment behind this type of naming commits a few sins:
1) It's overly verbose. More than 3 words is a warning sign to me.
2) It's specific rather than generic. For instance if I name a function "sortSheepByHoofSize", it implies the reader know what hoofs are, cares about them and knows how to measure them. Whereas when naming it "sortSheep" or perhaps even just "sort", it's immediately understandable on a surface level to practically everyone.
3) Following on from 2), this type of naming lacks context. We should leverage the context of surrounding code and abstractions to make naming understandable, instead of trying to pack all the meaning into one name. Oftentimes there's repeated information in names that could be inferred from context instead.
>Look at how many much useless text we have here. "measurementOf" and "SideOfBox" add nothing but clutter to the naming, and writing out practically the same thing 4 times suggests we could abstract this into a data structure.
First off this "clutter" exists in the English language itself yet I hear no one complaining. I don't communicate with other people using shortcuts and context aware abbreviations like your suggesting. I literally say "here are the measurements of the box" both in written documentation and by sound, what black magic says that this is so wrong to do in code?
There is Nothing wrong with above code. Clutter doesn't harm readability it just harms aesthetics. And useless? Are you sure? Even if it was useless what harm does it do?
Now you could argue that the clutter itself can hurt reading efficiency. But honestly think about it. That's like 5 seconds of extra reading out of your life. It's not a big deal.
Most programmers just have this version of OCD. I get it the struct looks really ugly, but there is nothing logically wrong with it.
I mean you could make it more elegant like this:
struct Box {
x
y
z
t
}
But this could lead to all kinds of other issues. For example because I didn't label anything with "measurement" now the reader can mistake the values for the "positioning" of the box as opposed to "measurements" of the box.
You made a common mistake here in your naming in assuming that "measurement" was a useless prefix. It's not... but that's besides the point because every program writer can make that mistake. That's why when you add a bit more clarity to your naming you have a larger chance to avoid this mistake at a small cost of adding some ugliness to your code.
>1) It's overly verbose. More than 3 words is a warning sign to me.
Warning sign of what? It's a similar warning sign that your brain fires off when you're alone in the dark in the woods. There's nothing to fear logically but your brain kicks off warnings regardless. Same with this, your brain kicks off some sort of warning but when you work it out logically there's Nothing. Verbose code is not bad just like verbose documentation is not bad.
>2) It's specific rather than generic. For instance if I name a function "sortSheepByHoofSize", it implies the reader know what hoofs are, cares about them and knows how to measure them. Whereas when naming it "sortSheep" or perhaps even just "sort", it's immediately understandable on a surface level to practically everyone.
Data referring to a specific concept or a generic concept is a structural decision made by you. I chose data referring to a specific concept. This happens in code. Not everything is generic and for specific things there's nothing wrong with using very specific names.
>3) Following on from 2), this type of naming lacks context. We should leverage the context of surrounding code and abstractions to make naming understandable, instead of trying to pack all the meaning into one name. Oftentimes there's repeated information in names that could be inferred from context instead.
>>We should leverage the context of surrounding code and abstractions to make naming understandable, instead of trying to pack all the meaning into one name.
There's no downside into packing more info into a name. Sure it can get to a point where it's unreasonable but in general there's nothing wrong with me calling something a measurement when that's what it is... This is an aesthetic issue that programmers react due to human bias, but there's nothing intrinsically wrong with prefixing measurement onto box. There's nothing wrong with repeated information either.
By prefixing measurement onto the Box struct I let everyone know that these are measurements on the box and not the position of the box. It's ugly but ugliness has nothing to do with readability or structure.
This is the bias programmers need to get rid of.
Find beauty in the structure of your code and find readability in the naming. Don't make the mistake of trying to find beauty in naming. Nobody wants to decipher code with poetic naming.
Have you guys heard of literate programming by donald knuth? He's taking what I'm talking about to extreme heights.
x, y, z, t is not the suggested alternative. leftBottom, rightBottom, was clearly the intended alternative. A "Length" suffix could be added to make it clear it's not coordinates, but I think it's better to encode this into the struct naming:
In code working with these objects it's likely both obvious and necessary to know that the code works with dimensions. Ie.: being reminded of this each time you encounter a variable it just noise.
The second only need a glance, the first you have to read/scan multiple times to dig out which delta it is.
If it's important to distinguish between measurements and true values it would be better to encode this in the struct variable name. Having distinct types for measurements and true value would be a hassle no?
I agree it is a better name. My version was deliberately designed to hurt your eyes to show you that the pain is just psychological. There is no intrinsic difference between your naming versus mine other than mine takes a millisecond longer to read the longer names. My version is quite ugly as well, but ugly naming has no negative effect on your code. It's all about understandable naming.
Not every concept can fit into a beautiful name as you did here. What if the box was a very specific box out of many boxes and I needed to specify the details?
DimensionsOfBlueBoxFromRoom253 = { ... }
>The second only need a glance, the first you have to read/scan multiple times to dig out which delta it is.
That's only because you already know what Dim means. Many many times I see abbreviations that are unknown. For example:
TranBox.Dow
Would it be clear to you that Trans means translation and Dow means down?
>Code with longer names are harder to scan.
Same with english. It's a small price to pay but people tend to enjoy reading english more than programming abbreviations and shortcuts. You spend an extra second reading a line, but you gain much more clarity about the intention of the programmer. Where with an abbreviated name you can often be unsure of what the programmer meant and you'd have to dig into the surrounding context.
trueBox.bottomLeft - measuredBox.bottomLeft
See? already I don't even know what you mean by true. True can mean anything. Could it be you're referring to a box with the the word true printed on it? A layman will not understand the difference between a truebox and a measuredbox. The below is infinitely better:
Perhaps it's ugly and offensive to your aesthetic tastes, but there's zero ambiguity here. In fact I would use snake case for even more clarity:
Box_with_Estimated_dimensionS
That variable name is not elegant but there is literally zero way I can misunderstand the meaning. Note how I combined snake case with alternating camel case above. This really pisses some people off, but when you think about it mixing camel case with snake case is just aesthetic bs that have nothing to do with the ultimate goal of your programming style: Readability. Also note the capital S at the end of the name. Believe it or not the capital S has ZERO effect on the quality, readability and even the verbosity of your source code yet this is what programmers will bitch and moan about the most.
>The second one is just exhausting to me at least.
Exhausting like the english language is exhausting? Exhausting like commenting your code is exhausting? People don't complain about the verbosity of English and all I'm suggesting is bringing the verbosity of programming a bit closer to English so that the understandability of your code is ALSO closer to english.
> My version was deliberately designed to hurt your eyes to show you that the pain is just psychological. There is no intrinsic difference between your naming versus mine other than mine takes a millisecond longer to read the longer names. My version is quite ugly as well, but ugly naming has no negative effect on your code. It's all about understandable naming.
..let's jump back a bit to something else you said:
> First off this "clutter" exists in the English language itself yet I hear no one complaining. I don't communicate with other people using shortcuts and context aware abbreviations like your suggesting.
This level of clutter doesn't exist in the English language, because we do use the contextual shorthand GP is leanings towards all over the place. I'm fairly sure you do the same, but possibly aren't aware of doing it because it comes so easily in natural language.
For example, using this paragraph from the Wikipedia page on Romeo and Juliet:
> Juliet visits Friar Laurence for help, and he offers her a potion that will put her into a deathlike coma or catalepsy for "two and forty hours". The Friar promises to send a messenger to inform Romeo of the plan so that he can rejoin her when she awakens. On the night before the wedding, she takes the drug and, when discovered apparently dead, she is laid in the family crypt.
And changing it to use your insistence on descriptive names with "cluttered English" instead of contextual shorthand, it becomes:
> Juliet Capulet visits Friar Laurence for help, and Friar Laurence offers Juliet Capulet a potion that will put Juliet Capulet into a deathlike coma or catalepsy for "two and forty hours". Friar Laurence promises to send a [independent] messenger to inform Romeo Montague of the plan so that Romeo Montague can rejoin Juliet Capulet when Juliet Capulet awakens. On the night before the wedding [between Romeo Montague and Juliet Capulet], Juliet Capulet takes the coma drug and, when discovered and apparently dead, Juliet Capulet is laid in the Capulet family crypt.
(parts in [] I'm unsure about because it's been so long since I've read/seen the play)
Long identifiers are exhausting to read, even with native speakers, because of the heavy repetition that has to be read and discarded with every single use. It's why we don't speak or write like in that second example.
Man of course there are english passages that are too verbose to read. Everyone knows this. This is not what I'm talking about. I am saying the level of NORMAL verbosity in the english language is already illogically considered to be too verbose for programming. There is an irrational dichotomy here and you can't see it.
You took shakespeare and upgraded it to be more verbose than normal. That example does not disprove my point because you misunderstood.
Let's start with a normal example. My original Box example. Step by step. It is in your opinion that the Box example I created is waaay to verbose. My claim is that IF I translated my Box example into the English language it will not be considered Verbose by the average english speaker. Take this english phrase:
The measurement of the left bottom side of the box is 26 centimeters. The right upper side of the box is 10 centimeters.
We can both agree the level of clutter above is Normal for the English language. I can shortcut it though if you want.
Box: x is 26, y is 10.
The second example is the shorthand we tend to use in programming. We think it's fine in programming but it's not fine in English.
struct Box {
x = 26,
y = 10
}
English in 99% of all cases, we prefer way more verbose syntax than programming, even just by the virtue of grammatical words like "the" "and" or "of". My argument is to move programming more in the direction normal english verbosity. That's all. Add more meaning to your variable names, spell out the purpose of the name.
I'm not advocating insane levels of obvious of detail here. I'm obviously not saying we do this:
variableThatCanBeAddedToOtherNumbers = 22
which is what your example is accusing me of.
I'm saying fill in the variable name with necessary details that the reader would need to know. Fill in details in your variable name that you would put in documentation to help the reader understand. This is what they call self-documenting code. Additionally there is no need to cut details for elegance, it is pointless to use an abbreviation if that abbreviation has a probability to be misinterpreted or confusing.
UpperLeftSideInCentimeters is infinitely better than x as x can lead to a ton of confusion. Either way nobody will call something "x" in english they'd choose the more verbose and informative way despite the "exhaustion" in reading. On average nobody complains about English so the same can be said if we did it in programming.
Additionally programming will never approach the verbosity of English. I'm just saying programming needs to be a bit more verbose and not excessively more verbose than English itself.
> My version was deliberately designed to hurt your eyes to show you that the pain is just psychological
Are you saying that psychological pain is not real? :D
> Where with an abbreviated name you can often be unsure of what the programmer meant and you'd have to dig into the surrounding context.
But this is context you'll usually need to acquire to work with the code anyway. (abreviations can definitively be overdone though)
I agree that code should strive to require little context to understand, but there is a trade-off. In some sense: if the name encodes the whole context there's no need for the name in the first place:
I began to write the above as a volume function of the box we've been talking about, but realized I don't really know what kind of box it is. Then I realized it might be a 2d box, not a 3d box? But then the names doesn't really makes sense? (I guess measurementOfBottomFrontSideOfBox, etc. was omitted for brevity)
So even this crucial information is missing from the names and require context. Which isn't necessarily that bad, since in any real scenario it would likely be included in the minimum-viable-context.
You want to encode as much context into the name as possible without breaking abstraction. Meaning that the definition of the function should not be part of the name. Only the API of the function and context should be encoded into the name.
Well, BCC actually seems quite the total opposite of elegance, according to my own sense of elegance. Elegance is far more subjective. Take a panel of some equally experimented programmers, and ask them to assess some code on its readability and elegance. I would expect the latter to vary far more greatly.
Thus said I would find the other example an ominous sign on the code architecture. I would be more at ease to find a line like :
box.edge(3).span.set(200)
Or any syntactic variation of such statement.
It's not the variable name per se, it's what it conveys about the abstractions used in the code base.
>Thus said I would find the other example an ominous sign on the code architecture. I would be more at ease to find a line like :
That's just your bias kicking in. I didn't define the requirements of the program. If I defined the requirements of the program to be ONLY printing out a number that represents the left edge of a box when the measurement is inputted into the console than defining an entire box struct is in itself a code "architecture" problem.
Is your brain telling you that it should be structured like so:?
Vacuum.light.speed = 9999999999
Likely no, because it doesn't make sense to place light as a parameter of vacuum... showing that a big long phrase in a variable name is not in itself a "smell" that needs to be "re architected."
You see it is not the grammar of the phrase that is making you want to restructure the code but the contextual meaning. This is one of those optical illusion like triggers in our brain, you want to organize something in a certain way but my post didn't give you a logical reason to do so. I never defined that an entire Box was provided as input.
Of course, context modulates everything. Here the context is an informal discussion to throw generalities on our own feelings on what makes a code readable. Sure, for more specific cases backed with some contextual data, it might be overkill to have a too sophisticated code hierarchy.
For your new example, I would find any of the following more fine:
maximum_celerity = 300 # Mm/s
class Physics { public const speed c = 299792458; }
So, I agree, length of identifiers alone is not enough to trigger a "this would be more nicely structured as…", and context matter. "speedOfLightInVacuumMetersPerSecond" is readable, and might be good enough for some specific contexts. But I wouldn’t take that as a desirable practice example.
Just because context matters, doesn’t mean that "all generality are absolute evil" (a statement that condemn itself).
What's wrong with flat data? Why is nested data better than flat data? If you think about it, there's really nothing wrong with either.
The only reason you would want to structure your data like that is if you want to think about the Box as a whole from a more abstract level.
But there's nothing wrong with writing an entire app that focuses on the measurement of the left bottom side of a box. If that's the requirement, than there's no need to define the measurement as part of a higher order type "Box" if you're not doing anything with the other measurements.
Nothing wrong perse, but talking about readability, we see the world in categories and we communicate abstractions in words, I didn't say flat data is wrong, but that word in particular being too long smells like a bit of structure is needed, just to add another dimension to consider, which is how we abstract things in the world and model them in software matters.
But if the structure is unused and my program only deals with a single integer does it make sense to modify the structure of the program for readability?
No. If my program only modifies the left bottom side of the box it doesn't need the whole box defined. In this case and many cases similar to this, it's totally fine encoding more verbosity into the naming of the integer.
The "smell" your sensing here is only called a smell because you can't logically explain it. Your brain is firing off a false alarm.
I can easily make a variable name that can trigger the false alarm but you won't be able to come up with a reasonable nested structure to contain it.
What are you gonna do here? I have a string that represents a color of the sky at a certain day on a certain time. What structure can reasonably contain this concept and ONLY this concept. Unless you want to define a structure called Sky the long name is the only possibility.
Is all context, if you have such specific need probably the app context is already understood and there is no need to have such big names, perhaps the apps is called, 'colors of Thursday 6pm' the you just hold a variable for sky.
Same for box side, if your app only deals with a side, then is boxSide, you wouldn't say upperBottleTap, because all bottles have the tap on the upper side, and there is only one, so context is everything.
Change english to Korean and all the variables names to the Korean translation and read your reply again. To someone who is fluent in Korean there is no problem, but to the rest of us English speakers (who don't know Korean) the long names with weird symbols are even less readable than lbb.
I have in my day job some code we got from Korea. I'm told by those who have spent the time to understand it that it is good code. But since it was written in Korean it is a level harder to figure out.
I use the word "english" but really my argument is language agnostic. Choose the language relative to the audience is my moto.
>I have in my day job some code we got from Korea. I'm told by those who have spent the time to understand it that it is good code. But since it was written in Korean it is a level harder to figure out.
Exactly this is my point. Abbreviations, lack of verbosity in naming and shortcuts might as well be Korean for the reader. It really doesn't matter how good the code is.... the understandability of your naming is by far more important.
A thousand times, yes!
There is code out there so elegant and pure that, after painstakingly pulling apart its pieces and analyzing each choice of word like a Robert Frost poem, we're left feeling like we've seen a face of God.
When debugging in the middle of the night because an alarm went off or a customer is angry, T. S. Elliot is absolutely the last thing I want to see. When I want to maintain a large system quickly, I want to be treated like I'm 5. I don't want Shakespeare. I don't even want Shakespeare for Dummies. I want "Dick and Jane Process a Customer Order."
I absolutely agree with the author that when I say "This code is unreadable," I actually mean "I haven’t spent enough time trying to read," but time is something I don't have. My coworkers and I will likely read the same code many times. We only have so much time to spend reading code. Code that takes a lot of time to read is bad. We do not have time to keep studying the code until it is no longer unreadable.
We are more than willing to invest quite a lot of additional time writing the code to make the time it takes to read the code as low as conceivably possible. Heck, with a small handful of exceptions, I will happily trade quite a bit of performance for simplicity.
Sure, if you're Terje Mathisen, and you're writing Quake 3, and Polyhymnia or Velvel Kahan descend from Mt. Helicon singing of inverse square roots and the number 0x5f3759df, and you happen to be trying to optimize a 3D game engine, sure, write poetry. Ages and ages hence, we shall read of your algorithm by the fireside and slowly smile as a blog explains to us why it is brilliant. Sing on! But if you're writing a normal program that's gonna be maintained frequently by normal people, you call security. That muse does not have a badge. She does not work here. There is a clear "no children of Mnemosyne" sign in the lobby.