Programming notation is the way it is primarily because of the fallacy that programming is mathematics. In reality, writing software is also very much like, well, writing.
Programming is much more like mathematics than writing. In fact, I'd argue that the only thing programming shares with literature is the fact that programs are expressed using character strings rather than with more specialized notation. Literature can be ambiguous, and can rely on the reader to infer what the author meant. Program must not (and indeed, in many cases cannot) be ambiguous.
Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language.
That is Perl's greatest strength and its greatest weakness. It's very hard to write Perl that doesn't run. The interpreter will go to great lengths to parse your program in the most generous way possible (if you let it). However, most of the time, if you've written something ambiguous, it's wrong, and should be flagged as such. This has led to a host of addons (like use strict) which attempt to make Perl less generous.
Another thing Perl mimics in natural language is implicit reference. $_ is like the pronoun “it”, a default thing, the current subject of discussion, which in many situations can be assumed. Programming languages use explicit reference almost exclusively. In order to perform a series of operations on a value, the programmer must explicitly name that value for every operation.
Explicit references are uambiguous. Has the author ever read a legal document? In legalese, just like in programming, all nouns, save for the most common ones, are defined before they are used. The reason for this is the same as in programming - it is best to be unambiguous when attempting to communicate in a formal manner.
In fact, when I'm dealing with Perl, I absolutely hate the use of $_. It adds massively to the state that I have to carry in my head when I'm reading the code, because it's so easy to miss cases in which $_ gets modified.
Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.
I don't care about how "expressive" language is. I care about how easy it is to write 1) correct and 2) maintainable programs in that language. These 2 criteria militate for minimal syntax (less syntax mean less syntax to screw up) and a level of formalism that catches errors as quickly as possible from the point they occur.
Erroneous Errors
I'm not even sure where the author is trying to go with this section. We've gone from talking about programming language design to... compiler error messages? The language is not the compiler and the compiler is not the language. Most programming languages have more than one compiler or interpreter. These compilers will give differing error messages. As another comment demonstrates, Clang gives better error messages than gcc. Is C somehow a better language if you use it with Clang? Is it somehow a worse language if you use it with gcc? No! The language remains the same, regardless of what you use to compile it.
Terrible Typography
Programmers prefer monospace fonts because there are fewer variables, and therefore, fewer things to mess up. With a monospace font, I don't have to worry about kerning. I don't have to try to guess how far my code is indented. If I use spaces for indentation, I don't even have to worry about my code looking different on different computers. Wherever I go, my code should look exactly, unambiguously, the same. For me, that concern trumps readability a thousand times over. I'll take an ugly font that's unambiguous over an ambiguous pretty font any day.
Impossible Input Methods
APL tried to have funny characters as operators[1]. It was a bad idea. I don't want to have to switch my keyboard around to match the programming language. I don't want to have to type ALT+<code> to get an operator. It's much easier if programming languages use standard characters that are guaranteed to be present on almost all keyboards. Indeed, the C preprocessor defines trigraphs[2] for keyboards that don't have characters like '{' and '#'.
Even worse, when I'm reading code, I don't want it to be peppered with blank boxes just because I don't have the particular font for this programming language installed.
Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them.
Either that, or because many of these ideas have been tried and rejected when hard experience proved that they didn't enhance productivity very much and hurt maintainability immensely.
Well said! I don't think I found a argument in that article that I would agree with. The mentioning of Perl as a great example is probably the worst. The purpose of a programming language is to express things as precisely as possible, both to a computer and to a human. Perl's extreme context-sensitivity is what I dislike most about it.
Regarding Unicode support, I see it having some advantages, but current editors don't support Unicode input well. Emacs has an input mode where you can type \alpha and it replaces it with the appropriate unicode character, but I haven't seen any other editors with support for that. For some more obscure symbols, it can also be difficult to find out how to input them.
I'm wondering whether that article is actually meant seriously -- some of these claims are just so absurd.
Yeah, I was reading the article and saw the mentions of Perl and was like -_-.
I think the legal writing analogy is interesting. I think legal writing and programming are closer to each other than either are to math or literature.
There are pleasant, non-awful ways of doing characters in programming languages that don't involve memorizing the hex codes for Unicode characters or weird keyboards. For example, the Fortress language has three representations—a pretty, LaTeX-ish image form with operator symbols and the like, a plain Unicode form, and an ASCII form; I am under the impression that you type it up in ASCII, and then it gets prettified for printing. I'm actually as I type procrastinating from writing some Agda code, and Agda also uses Unicode in its source (it is admittedly heavily tied to its emacs mode, and inputting characters is done by typing in the LaTeX character entity, which the mode converts to the Unicode equivalent.) Agda is still ASCII-tolerant, as well; for example, it understands both → and its ASCII equivalent -> as being an arrow.
The issue is, a lot of languages don't necessarily need this. In a close cousin of Haskell like Agda, it makes sense, because it's very much like writing mathematics on a page, so using Greek letters and operator symbols is expected and will be understood by Agda programmers, especially if there's an assumption that it will be widely printed or read, as Fortress would be. But I honestly can't look at C++ and say, "Oh, it would be almost infinitely better if everyone updated their compilers so I could overload the × operator! That's exactly what I want—to be forced to use a Unicode terminal font so I can read source code over SSH!"
Programming is much more like mathematics than writing. In fact, I'd argue that the only thing programming shares with literature is the fact that programs are expressed using character strings rather than with more specialized notation. Literature can be ambiguous, and can rely on the reader to infer what the author meant. Program must not (and indeed, in many cases cannot) be ambiguous.
Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language.
That is Perl's greatest strength and its greatest weakness. It's very hard to write Perl that doesn't run. The interpreter will go to great lengths to parse your program in the most generous way possible (if you let it). However, most of the time, if you've written something ambiguous, it's wrong, and should be flagged as such. This has led to a host of addons (like use strict) which attempt to make Perl less generous.
Another thing Perl mimics in natural language is implicit reference. $_ is like the pronoun “it”, a default thing, the current subject of discussion, which in many situations can be assumed. Programming languages use explicit reference almost exclusively. In order to perform a series of operations on a value, the programmer must explicitly name that value for every operation.
Explicit references are uambiguous. Has the author ever read a legal document? In legalese, just like in programming, all nouns, save for the most common ones, are defined before they are used. The reason for this is the same as in programming - it is best to be unambiguous when attempting to communicate in a formal manner.
In fact, when I'm dealing with Perl, I absolutely hate the use of $_. It adds massively to the state that I have to carry in my head when I'm reading the code, because it's so easy to miss cases in which $_ gets modified.
Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.
I don't care about how "expressive" language is. I care about how easy it is to write 1) correct and 2) maintainable programs in that language. These 2 criteria militate for minimal syntax (less syntax mean less syntax to screw up) and a level of formalism that catches errors as quickly as possible from the point they occur.
Erroneous Errors
I'm not even sure where the author is trying to go with this section. We've gone from talking about programming language design to... compiler error messages? The language is not the compiler and the compiler is not the language. Most programming languages have more than one compiler or interpreter. These compilers will give differing error messages. As another comment demonstrates, Clang gives better error messages than gcc. Is C somehow a better language if you use it with Clang? Is it somehow a worse language if you use it with gcc? No! The language remains the same, regardless of what you use to compile it.
Terrible Typography
Programmers prefer monospace fonts because there are fewer variables, and therefore, fewer things to mess up. With a monospace font, I don't have to worry about kerning. I don't have to try to guess how far my code is indented. If I use spaces for indentation, I don't even have to worry about my code looking different on different computers. Wherever I go, my code should look exactly, unambiguously, the same. For me, that concern trumps readability a thousand times over. I'll take an ugly font that's unambiguous over an ambiguous pretty font any day.
Impossible Input Methods
APL tried to have funny characters as operators[1]. It was a bad idea. I don't want to have to switch my keyboard around to match the programming language. I don't want to have to type ALT+<code> to get an operator. It's much easier if programming languages use standard characters that are guaranteed to be present on almost all keyboards. Indeed, the C preprocessor defines trigraphs[2] for keyboards that don't have characters like '{' and '#'.
Even worse, when I'm reading code, I don't want it to be peppered with blank boxes just because I don't have the particular font for this programming language installed.
Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them.
Either that, or because many of these ideas have been tried and rejected when hard experience proved that they didn't enhance productivity very much and hurt maintainability immensely.
[1] http://www.wickensonline.co.uk/apl/unicomp-apl-top.jpg [2] http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C