I am always happy to see people make their own languages and write about it.
For this article in particular, I disagreed with the first two parts:
1. Premise: Simplifying assumption: There are two kinds of languages, static and dynamic. Somehow, it's just good to know this, and nothing is said of which of these the Duck language fall into.
2. Syntax: Wait, no.
There is an implicit premise here, being that if you make a language, it's probably going to be a general-purpose one with expectable control-flow structures, and because of that, what we care about is how those control-flow structures look, i.e. should there be curly braces around blocks and such.
A premise I would prefer is: When you make a language, you want to solve a problem. Imagine a language being an overgrown library. You don't start a tutorial in making libraries about how a good library interface looks like. You start by saying what problem you want to solve.
Another premise I would prefer is: Most problems are domain-specific, so most language-shaped solutions to problems are DSLs.
index, value := with
i := 0
count := len(haystack)
loop
while i < count else not_found
elem := haystack[i]
if is_special(elem) then continue
if match(elem, needle) then break found(elem)
// could also have been
// while not match(elem, needle) else found(elem)
// until match(elem, needle) then found(elem)
always
i = i + 1
on not_found
-1, _
on found(result)
i, result // "i" is in scope here, but not "elem"
...
That, I believe, is one of the most generic forms of a loop, with a pre-header for declaring loop-local variables, multiple exit points (that accept parameters!), flexible condition checking ("while/until" being syntax sugar for "if ... then break ...", "continue" is "goto" the "always" block), and expanded space for the increment statement — and AFAIK no modern language supports it.
I mean, if you are making a new language, that's your chance to not stick with the old and tried stuff but try something more adventurous, like this loop device from Knuth's old paper.
I think the reason you won't find this in many languages is because people gravitate towards fewer constructs rather than more, and this syntax can be expressed using more general purpose semantics without any additional cost.
The syntax debate is usually something like, "what does this buy us and do we have the budget for it" and imo that loop syntax is not sufficiently general (it cannot express recursion, for example) nor does it make code easier to read (you now have what's essentially a few if/else blocks split across four different code blocks, I would reject this if I saw it in a code review to make the loop simpler), and it doesn't really prevent a user from making mistakes.
Extremely generic control flow constructs have also been shunned a bit. There's a lot of "call/cc considered harmful" content on the internet from a few years back, and I predict we'll see the same thing about new-fangled control flow constructs like algebraic effects.
while cond1
stmts1
elif cond2
stmts2
elif cond3
stmts3
syntax which is essentially a few if/else blocks pulled up to the enclosing loop level. Makes writing e.g.
while a > b
a := a - b
elif b > a
b := b - a
somewhat more concise. And Rust's "while let" loop easily could have allowed several clauses instead of single one.
And yes, you can express all of that with either "while true: ..." with early breaks/returns or with "while <OR'ing lots of Boolean flags>: ..." with post-processing those flags to determines why the loop has ended... but why not have explicitly named, limited continuations? IMO it would simplify dealing with "there maybe many reasons why the loop ended" kinds of iterations. Putting every loop into its own (anonymous) function that returns an ad-hoc sum type doesn't sound like a much better solution to me, to be fair.
from what i can tell, the only things here not supported by CL's LOOP is "continue", and there being named exit points with parameters, though I fail to see the usefulness of that when you could just use return.
Yes especially if you can make an inner (embedded) function just for that, that won't spill over the global function namespace. Ada's good in that aspect.
A friend recently told me that DSLs (stand-alone) and eDSLs (embedded in an existing language) have the main difference that DSLs are overgrown configuration files, and eDSLs are overgrown libraries.
I'd say: When you want to call your DSL from more than one programming environment, it is good to have an abstract representation that transcends one particular syntax-tree definition. A library is good until then.
I think there are definitely wrong answers to syntax if you want people other than PL nerds to use your language. Agreed that a library shouldn't start its tutorial talking about how nice its interface is. But at a certain point, you should probably come around to a syntax that is fairly standard. There's a reason a lot of UI libraries standardized around JSX.
Basically, syntax might be your last thought, but that doesn't mean you should neglect it.
Syntax is the bike-shedding of programming language design.
Imagine you're going to write a book, and before you think of the subject, you're rabbit-holing whether or not to use the Oxford comma. This is literally the significance of syntax. Of course, it matters. But not as much as what your book is about.
Thank you! This is something I've encountered in every tutorial on language design. I've been trying to build a DSL for years, but I'm stuck because everything I read isn't suitable for my use case. Part of me wants to just say "fuck it, I'll learn how to build a GPL, then hopefully by the end of it I'll have enough on-hand knowledge to transfer over". But I really would rather not spend that much effort.
It's not in the bibliography at the bottom, so I thought I'd give a shout out to Crafting Interpreters, which is a very accessible tome on language design and development:
Yet another compiler how-to which pretends to be a design. There is no usage design as such, not a word on semantics, even the work "semantic" is not mentioned in the whole text.
The only design here might be the verbatim copy of a compiler architecture which is explained over and over. And some people might call architecture "design" and "programming language" a syntax compiler.
I agree with you that the lack of semantics, and their discussion in almost all Programming Language creation tutorials/books/videos, is disheartening.
However, I often wonder why it’s left out. I think charitably, I can assume that authors want to present the mechanical implementations and development of a language and it’s developer experience. To do do so, they just use typical (and often simplified) strictly imperative or strictly functional semantics hoping the audience has an implicit understanding of what the PL constructs ‘mean’. Or, I can assume with a lot less charity, that programming language semantics (particularly novel or niche semantics) are hard to understand and explain and the authors are not competent to talk about it so they just don’t bother or consider it.
Where do you find yourself falling in explaining the lack of semantics in these types of tutorials on PL design?
that's something I've been looking for a long while and the only places where I find it is usually on interviews from programming language creators who have been successful and talk about the topic tangentially, but I've seen no book on the ergonomics of programming language design.
Some common resources which I already identified are the talk simple made easy[1] or the famous everything is an X[2] which was here in HN a few days ago. Also a common problem I see, is that many scholars and intellectual people confuse what the meaning of "semantic" is. For example they might mistake "semantic" with "meaning" and do not understand as semantic problem collapsing two different variables in a single variable (e.g. the classic "0 means unlimited" like some old C programs did).
If you do find a resource, just let me know, but I hope there is more on the topic beyond using a regular math model, just for the sake of uniformity, and try to make everything "as easy as possible" based on a gut feeling.
There are lots of resources out there about making a parser and interpreter and calling that a “language”. While those two components are necessary, alone they don’t seem sufficient in 2022 to call something a programming language; the scope of what a language is has expanded to include the ecosystem and community, which are often aspects cited these days for using or not using many languages.
Question: does anyone know of resources that discuss the rest of programming language design?
- designing a standard library
- building tooling like debuggers and package managers
- interfacing with other languages
- managing and growing a community
- evolving the language over time as it grows
- integrating with IDEs and building language server support
- how to communicate the language to others so they understand the value proposition.
- how to choose the right set of features to support. How to discern a good set from a bad set, how to make the language stand out and unique.
- also maybe issues of how to stay motivated and focused. It takes years to build a language, how does one stay committed to it over that time?
I understand why there might not be many resources that cover these angles, as not many language projects get past the parser stage, but I’m still curious what others have found.
Sorted by most downloaded, the first page of results includes the history of Erlang, Haskell, C, C++, Smalltalk, Fortran, AppleScript, Lisp, Pascal, Prolog..
Many of the papers are written by the people who designed, developed, and maintained the programming language, its ecosystem and community.
What I find fascinating are their retrospective thoughts on what didn't work well, regrets of what turned out to be bad design decisions. The authors also often reflect on how far programming in general has evolved, what aspects and features have become expected, like package manager, etc.
In any substantial language, the "lexer" and "parser" represent less than 1% of the work of implementing the language. In the Ecstasy (xtclang) project, writing the lexer and parser (and then bootstrapping them both in the Ecstasy language) took less than 4 person weeks, out of 750+ (and counting) person weeks spent implementing the language.
After reading the title I thought that this would be an article about a programming language named I, in the style of B, C, D, J, K, et cetera. "Part I" would be much clearer.
Apparently "the certificate is only valid for the following names: *.github.com, github.com" so this is likely a "github pages" page hiding behind a proper domain name.
I wonder if it would be practical for any web server that receives a TLS request for any domain it doesn't know about (but has nonetheless arrived on the server) to make a Let's-Encrypt request for that domain and respond to the original request in time.
I don't believe it can respond to a request live, but it can dynamically request a cert for any new domain and have it presumably in seconds, so I don't think it's out of the realm of possibility. This would only be for a subdomain of a parent domain you already own and can prove via an ACME challenge.
For this article in particular, I disagreed with the first two parts:
1. Premise: Simplifying assumption: There are two kinds of languages, static and dynamic. Somehow, it's just good to know this, and nothing is said of which of these the Duck language fall into.
2. Syntax: Wait, no.
There is an implicit premise here, being that if you make a language, it's probably going to be a general-purpose one with expectable control-flow structures, and because of that, what we care about is how those control-flow structures look, i.e. should there be curly braces around blocks and such.
A premise I would prefer is: When you make a language, you want to solve a problem. Imagine a language being an overgrown library. You don't start a tutorial in making libraries about how a good library interface looks like. You start by saying what problem you want to solve.
Another premise I would prefer is: Most problems are domain-specific, so most language-shaped solutions to problems are DSLs.