Designing a Programming Language: I

sshine · on Oct 27, 2022

I am always happy to see people make their own languages and write about it.

For this article in particular, I disagreed with the first two parts:

1. Premise: Simplifying assumption: There are two kinds of languages, static and dynamic. Somehow, it's just good to know this, and nothing is said of which of these the Duck language fall into.

2. Syntax: Wait, no.

There is an implicit premise here, being that if you make a language, it's probably going to be a general-purpose one with expectable control-flow structures, and because of that, what we care about is how those control-flow structures look, i.e. should there be curly braces around blocks and such.

A premise I would prefer is: When you make a language, you want to solve a problem. Imagine a language being an overgrown library. You don't start a tutorial in making libraries about how a good library interface looks like. You start by saying what problem you want to solve.

Another premise I would prefer is: Most problems are domain-specific, so most language-shaped solutions to problems are DSLs.

Joker_vD · on Oct 27, 2022

And speaking of control structures...

    index, value := with
        i := 0
        count := len(haystack)
    loop
        while i < count else not_found
        elem := haystack[i]
        if is_special(elem) then continue
        if match(elem, needle) then break found(elem)
        // could also have been
        //   while not match(elem, needle) else found(elem)
        //   until match(elem, needle) then found(elem)
    always
        i = i + 1
    on not_found
        -1, _
    on found(result)
        i, result  // "i" is in scope here, but not "elem"
    ...

That, I believe, is one of the most generic forms of a loop, with a pre-header for declaring loop-local variables, multiple exit points (that accept parameters!), flexible condition checking ("while/until" being syntax sugar for "if ... then break ...", "continue" is "goto" the "always" block), and expanded space for the increment statement — and AFAIK no modern language supports it.

I mean, if you are making a new language, that's your chance to not stick with the old and tried stuff but try something more adventurous, like this loop device from Knuth's old paper.

duped · on Oct 27, 2022

I think the reason you won't find this in many languages is because people gravitate towards fewer constructs rather than more, and this syntax can be expressed using more general purpose semantics without any additional cost.

The syntax debate is usually something like, "what does this buy us and do we have the budget for it" and imo that loop syntax is not sufficiently general (it cannot express recursion, for example) nor does it make code easier to read (you now have what's essentially a few if/else blocks split across four different code blocks, I would reject this if I saw it in a code review to make the loop simpler), and it doesn't really prevent a user from making mistakes.

Extremely generic control flow constructs have also been shunned a bit. There's a lot of "call/cc considered harmful" content on the internet from a few years back, and I predict we'll see the same thing about new-fangled control flow constructs like algebraic effects.

Joker_vD · on Oct 27, 2022

There are languages with

    while cond1
       stmts1
    elif cond2
       stmts2
    elif cond3
       stmts3

syntax which is essentially a few if/else blocks pulled up to the enclosing loop level. Makes writing e.g.

   while a > b
     a := a - b
   elif b > a
     b := b - a

somewhat more concise. And Rust's "while let" loop easily could have allowed several clauses instead of single one.

And yes, you can express all of that with either "while true: ..." with early breaks/returns or with "while <OR'ing lots of Boolean flags>: ..." with post-processing those flags to determines why the loop has ended... but why not have explicitly named, limited continuations? IMO it would simplify dealing with "there maybe many reasons why the loop ended" kinds of iterations. Putting every loop into its own (anonymous) function that returns an ad-hoc sum type doesn't sound like a much better solution to me, to be fair.

Jim_Heckler · on Oct 27, 2022

from what i can tell, the only things here not supported by CL's LOOP is "continue", and there being named exit points with parameters, though I fail to see the usefulness of that when you could just use return.

touisteur · on Oct 27, 2022

Yes especially if you can make an inner (embedded) function just for that, that won't spill over the global function namespace. Ada's good in that aspect.

ogogmad · on Oct 27, 2022

What DSLs might someone make today? And when should they be libraries instead?

[edit] There was a recent example of a constant-time C-like programming language: https://news.ycombinator.com/item?id=33308037

sshine · on Oct 28, 2022

> What DSLs might someone make today?

Oh, but there are SO many!

These two super-categories I am very fond of:

- Query languages

- Infrastructure specification languages

Here are some concrete examples:

GraphQL: An API query language: https://graphql.org/

jq: A JSON command-line processor: https://stedolan.github.io/jq/

HCL: A DSL for expressing Terraform assets: https://developer.hashicorp.com/terraform/cdktf/concepts/hcl...

Troll: A DSL for expressing dice roll simulations: http://hjemmesider.diku.dk/~torbenm/Troll/

fvg: Functional Vector Graphics: https://github.com/lemmih/fvg

Kleenex: Optimal regex-based stream processor: https://kleenexlang.org/

Futhark: High-performance purely functional data-parallel array programming: https://futhark-lang.org/

> And when should they be libraries instead?

A friend recently told me that DSLs (stand-alone) and eDSLs (embedded in an existing language) have the main difference that DSLs are overgrown configuration files, and eDSLs are overgrown libraries.

I'd say: When you want to call your DSL from more than one programming environment, it is good to have an abstract representation that transcends one particular syntax-tree definition. A library is good until then.

spacechild1 · on Oct 27, 2022

One good example that comes to my mind is audio programming / live coding languages.

hardwaregeek · on Oct 27, 2022

I think there are definitely wrong answers to syntax if you want people other than PL nerds to use your language. Agreed that a library shouldn't start its tutorial talking about how nice its interface is. But at a certain point, you should probably come around to a syntax that is fairly standard. There's a reason a lot of UI libraries standardized around JSX.

Basically, syntax might be your last thought, but that doesn't mean you should neglect it.

sshine · on Oct 28, 2022

Syntax is the bike-shedding of programming language design.

Imagine you're going to write a book, and before you think of the subject, you're rabbit-holing whether or not to use the Oxford comma. This is literally the significance of syntax. Of course, it matters. But not as much as what your book is about.

danielvaughn · on Oct 27, 2022

Thank you! This is something I've encountered in every tutorial on language design. I've been trying to build a DSL for years, but I'm stuck because everything I read isn't suitable for my use case. Part of me wants to just say "fuck it, I'll learn how to build a GPL, then hopefully by the end of it I'll have enough on-hand knowledge to transfer over". But I really would rather not spend that much effort.

thom · on Oct 27, 2022

It's not in the bibliography at the bottom, so I thought I'd give a shout out to Crafting Interpreters, which is a very accessible tome on language design and development:

https://craftinginterpreters.com/

nudpiedo · on Oct 27, 2022

Yet another compiler how-to which pretends to be a design. There is no usage design as such, not a word on semantics, even the work "semantic" is not mentioned in the whole text.

The only design here might be the verbatim copy of a compiler architecture which is explained over and over. And some people might call architecture "design" and "programming language" a syntax compiler.

</rant>

throwaway17_17 · on Oct 27, 2022

I agree with you that the lack of semantics, and their discussion in almost all Programming Language creation tutorials/books/videos, is disheartening.

However, I often wonder why it’s left out. I think charitably, I can assume that authors want to present the mechanical implementations and development of a language and it’s developer experience. To do do so, they just use typical (and often simplified) strictly imperative or strictly functional semantics hoping the audience has an implicit understanding of what the PL constructs ‘mean’. Or, I can assume with a lot less charity, that programming language semantics (particularly novel or niche semantics) are hard to understand and explain and the authors are not competent to talk about it so they just don’t bother or consider it.

Where do you find yourself falling in explaining the lack of semantics in these types of tutorials on PL design?

nudpiedo · on Oct 28, 2022

that's something I've been looking for a long while and the only places where I find it is usually on interviews from programming language creators who have been successful and talk about the topic tangentially, but I've seen no book on the ergonomics of programming language design.

Some common resources which I already identified are the talk simple made easy[1] or the famous everything is an X[2] which was here in HN a few days ago. Also a common problem I see, is that many scholars and intellectual people confuse what the meaning of "semantic" is. For example they might mistake "semantic" with "meaning" and do not understand as semantic problem collapsing two different variables in a single variable (e.g. the classic "0 means unlimited" like some old C programs did).

If you do find a resource, just let me know, but I hope there is more on the topic beyond using a regular math model, just for the sake of uniformity, and try to make everything "as easy as possible" based on a gut feeling.

[1] https://m.youtube.com/watch?v=SxdOUGdseq4&t=1250s [2] https://lukeplant.me.uk/blog/posts/everything-is-an-x-patter...

ModernMech · on Oct 27, 2022

There are lots of resources out there about making a parser and interpreter and calling that a “language”. While those two components are necessary, alone they don’t seem sufficient in 2022 to call something a programming language; the scope of what a language is has expanded to include the ecosystem and community, which are often aspects cited these days for using or not using many languages.

Question: does anyone know of resources that discuss the rest of programming language design?

- designing a standard library

- building tooling like debuggers and package managers

- interfacing with other languages

- managing and growing a community

- evolving the language over time as it grows

- integrating with IDEs and building language server support

- how to communicate the language to others so they understand the value proposition.

- how to choose the right set of features to support. How to discern a good set from a bad set, how to make the language stand out and unique.

- also maybe issues of how to stay motivated and focused. It takes years to build a language, how does one stay committed to it over that time?

I understand why there might not be many resources that cover these angles, as not many language projects get past the parser stage, but I’m still curious what others have found.

lioeters · on Oct 27, 2022

What comes to mind are the publications of the HOPL (History of Programming Languages) conference.

https://dl.acm.org/topic/conference-collections/hopl?sortBy=...

Sorted by most downloaded, the first page of results includes the history of Erlang, Haskell, C, C++, Smalltalk, Fortran, AppleScript, Lisp, Pascal, Prolog..

Many of the papers are written by the people who designed, developed, and maintained the programming language, its ecosystem and community.

What I find fascinating are their retrospective thoughts on what didn't work well, regrets of what turned out to be bad design decisions. The authors also often reflect on how far programming in general has evolved, what aspects and features have become expected, like package manager, etc.

cpurdy · on Oct 27, 2022

Exactly.

In any substantial language, the "lexer" and "parser" represent less than 1% of the work of implementing the language. In the Ecstasy (xtclang) project, writing the lexer and parser (and then bootstrapping them both in the Ecstasy language) took less than 4 person weeks, out of 750+ (and counting) person weeks spent implementing the language.

ModernMech · on Oct 27, 2022

Out of curiosity, how did you spend the remaining 99% of your effort?

cpurdy · on Oct 31, 2022

Mostly:

* The compiler.

* The type system.

* The core library.

phoe-krk · on Oct 27, 2022

After reading the title I thought that this would be an article about a programming language named I, in the style of B, C, D, J, K, et cetera. "Part I" would be much clearer.

sshine · on Oct 27, 2022

Since the h1 headers are called "Part 1", "Part 2", maybe the title should be "Chapter I".

Maybe that felt too much like writing a book, she the author opted for brevity. :-P

Existenceblinks · on Oct 27, 2022

I can open the `https` link above on firefox. Have to open:

http://ducklang.org/designing-a-programming-language-i

matths · on Oct 27, 2022

In Chrome you can just type "thisisunsafe" on your keyboard to open the page anyways.

bruce343434 · on Oct 27, 2022

Apparently "the certificate is only valid for the following names: *.github.com, github.com" so this is likely a "github pages" page hiding behind a proper domain name.

hdjjhhvvhga · on Oct 27, 2022

Fortunately the authors of the website didn't turn off plain HTTP.

billpg · on Oct 27, 2022

I wonder if it would be practical for any web server that receives a TLS request for any domain it doesn't know about (but has nonetheless arrived on the server) to make a Let's-Encrypt request for that domain and respond to the original request in time.

hotpotamus · on Oct 27, 2022

https://doc.traefik.io/traefik/https/acme/

I don't believe it can respond to a request live, but it can dynamically request a cert for any new domain and have it presumably in seconds, so I don't think it's out of the realm of possibility. This would only be for a subdomain of a parent domain you already own and can prove via an ACME challenge.

Existenceblinks · on Oct 27, 2022

Well, 8 hours later, I just see my typo. s/can/can't/

groffee · on Oct 27, 2022

It does make you wonder, how can they possibly design a new language when they can't even get https working right?

jng · on Oct 27, 2022

Because they are two separate skillsets.