Uniform Structured Syntax, Metaprogramming and Run-Time Compilation

discreteevent · on June 15, 2019

This is very impressive in only the way that lisp can be. But part of why it works is because SQL is a well defined interface. There was a comment on ltu a good while back:

------------

"I would say that a system that allowed other metathings to be done in the ordinary course of programming (like changing what inheritance means, or what is an instance) is a bad design. (I believe that systems should allow these things, but the design should be such that there are clear fences that have to be crossed when serious extensions are made.)"

The fact Kay realized this fine point of design in the late '60s (according to him) is why he is a Turing Award Winner.

I know Lisp programmers who even today don't understand this point - their code is succinct but the API has a massively unnecessary learning curve due to unclear boundaries.

Sometimes my coworkers object to me paying extraordinary attention to detail about what the boundaries are. However, if we don't pay attention to boundaries, we may as well all be Netron Fusion COBOL programmers munging VSAM records and EDI data formats. By Z-Bo at Wed, 2009-04-15

http://lambda-the-ultimate.org/node/3265#comment-48165

neilv · on June 15, 2019

I think more analogous to the examples Kay was giving would be for a single program to quietly change the rules for function application, or to quietly change the behavior of CLOS.

I think of an embedded DSL, based on a macro like it is here, as a bit different than that, because it's not doing anything quietly -- it applies only to the parenthesized extent of the code that begins with your macro name:

  (my-dsl-macro my dsl fills the rest of the parentheses)

People who don't already know what `my-dsl-macro` is see the documentation or code that says it's a macro. Even were it a normal function, people reading the code should probably know what the function does, so maybe they have to glance at where the IDE automatically displayed the syntax or documentation for them, anyway.

Ideally, your editor will also color the macro names differently, especially for readers who don't know what's a macro or a function. (Rust adds an exclamation mark to the macro name for a use, which is a good idea when you don't already know what's macros, but maybe a bit annoying to have all those exclamation marks when you do know that, say, `println` is a macro.) But even if you don't know the name is a macro, you'll be clued in if the text within it doesn't look like your top-level language, which it often doesn't (e.g., SQL in s-expressions doesn't look like base CL).

A key is to use DSLs judiciously -- for improved readability (for your base language programmer, or domain experts), maintainability, and/or performance. Maybe not for, say, a convenience for the sake of minor code terseness improvement only.

Of course, SQL is a whopper of a language, and perhaps overkill if you invented it as a DSL for this particular application. (More suspicious would be to invent your own relational query language that seems gratuitously different than SQL, when SQL already existed.)

As for changes more like I think Kay was talking about, in the Racket (Lisp family) universe, outside of a macro, normally you wouldn't do that, but when you have a good reason -- say, you want to prototype a lazy language, or implement a specialized language for a GPU programming backend, or a DSL for your domain experts -- you can. In that case, you'd have a `#lang` line at the very top of the file that tells you what different language this is, instead of `#lang racket`. You might make that produce modules that can interoperate with modules of other `#lang`s, but you're not doing anything sneaky that breaks the language of those other modules.

quelltext · on June 16, 2019

You still have the problem that the `my-dsl-macro` is a black box.

That macro can do anything it wants to the sub-AST. The problem is not about hygiene re: variables but about the meaning of symbols. A function application within it that doesn't obviously have anything to do with the DSL is also subject to the macro's will to have its meaning changed. This can make composition difficult. Who guarantees that if you combine `my-dsl-1-macro` with `my-dsl-2-macro` or simply with general purpose code, that they don't interfere in odd ways?

Granted, this is not a big concern for a well-designed and widely used macro. However, the bottom line is that the inherent freedom of the abstraction allows for issues to arise whose debugging require a deep understanding of the involved macros and their implementations.

You don't really have the same problem if there is only procedural abstraction. The boundaries are clearer there. Even if you do something like the Interpreter Pattern or Haskell style AST-constructing EDSLs, you have guarantees that whatever AST is constructed cannot inspect whatever non-DSL code you combined with it.

comma_at · on June 16, 2019

> Who guarantees that if you combine `my-dsl-1-macro` with `my-dsl-2-macro` or simply with general purpose code, that they don't interfere in odd ways?

Nobody does. But that's where the power lies. You can't have unrestricted power with restricted features. Lisp gives you unrestricted power. It's up to you to use it correctly, for w/e definition of correct your context makes sense.

quelltext · on June 18, 2019

Yes, but the argument was rather along the lines of "here's why I think (some people think) we shouldn't have unrestricted features" or "I don't want ... because...". Languages don't necessarily need macros (many don't have them) so arguing on the basis that the dangers of macros are an inevitable cost for having them is correct but it works under the assumption that macros are indeed desired by everybody.

It's a bit like arguing about pointers and memory safety.

I do think there is a lot of value to look into these things and research alternatives to macros. For instance the problem I have described could be addressed by a new macro-like feature that has to respect certain boundaries.

neilv · on June 16, 2019

True, `my-dsl-1-macro` potentially does anything inside its syntactic parentheses extent, including possibility manipulating syntax within it that someone just glancing at the code might assume is a code snippet that's pasted in verbatim, including what looks like a use of `my-dsl-2-macro`.

The verbatim paste seems like the usual case for macros.

This kind of potential expectation violation isn't specific to macros. For example, some functions could potentially mutate arguments when, at a casual glance, the expectation would be that that it doesn't mutate. (Even with an explicitly const argument, the language usually permits mutating something referenced by that immediate const, so we're back to expectations again.)

I think this goes back to conventions and judiciousness. And perhaps also IDE-supported easily-accessible documentation, and good naming.

quelltext · on June 18, 2019

Sure, the point I was trying to make is that macros add yet another layer of potential violation of expectation. It's hard (for me) to say whether their dangers outweigh their benefits (my guess is probably not), but it is something worthy of consideration to understand where anti-macro opinions come from.

neilv · on June 18, 2019

That concern sounds reasonable. It doesn't help that there's wide variation in the nature and quality of macro/DSL/minilanguage/metaprogramming stuff people see.

Personally, I'm very comfortable working with reasonable programmers using macros in Racket. And most CL programmers are very sharp, and could be trusted to use restraint with their more dynamic tools (especially if you're talking about work, rather than creative personal side projects).

For that matter, some non-macro/template overriding features of C++ pack more astonishment than syntax extension in Racket. And I wonder how much some suspicion of a good macro system is due to prior experience with languages like C++.

agumonkey · on June 16, 2019

I've never wrote commonlisp for work, but what I've read from people doing so is that they're not fools. All those I've seen had a clear notion of when and how much to abuse lisp power. Ugly is ugly and rare are seasoned lispers that are still confused on this.

m-felleisen · on June 16, 2019

Does it matter that McCarthy won the Turing Award before Alan Kay? What a silly point.

Is it awful that functions are black boxes? That they are bundled in libraries?

Does it matter that for loops don't leave a stack trace? How can poor programmers debug them w/o a stack trace?

Can you imagine that overriding a method completely changes what a bundle of methods does in a class?

It is so sad that every time a new idea comes out, the "resistance" (programmers who often lack experience with it or experienced an inferior implementation of the idea) are the loudest to complain and hold sw dev back.

barrkel · on June 15, 2019

This is talking about evaluating dynamic expressions over (semi-?)-structured row-oriented data, for the purpose of filtering.

It contrasts a tree interpreter in C++ with a JITted dynamically generated Lisp expression, with some hand-waving away of what the equivalent JIT in C++ would be, seemingly dismissing it as taking too long (is that what "unpause cosmic time" alludes to? I'm not sure).

The tree interpreter is a little unorthodox - it isn't how I'd write an AST-walking interpreter - and other interpreter techniques like generating linear programs for a simple virtual machine aren't considered. These can be pretty fast, especially with some use of implementation-specific computed goto, available in gcc and clang. It would get rid of the author's worries about recursion and lack of TCO, increase locality and decrease cache usage.

But of course there's not much need to write such an interpreter. Why not use a JIT framework for C++? Depending on the library, it wouldn't be a whole lot more complex than a traversal of the AST.

And the next question is, if the problem is querying plain-text databases, why not use Apache Impala? It's written in C++, and uses LLVM to compile SQL expressions into native code, and can evaluate filters (but not just filters, the full power of SQL) over CSV text.

Maybe Impala and its dependencies is too big, but if that's the case then your data is small and a simple interpreter would be plenty fast enough.

comma_at · on June 16, 2019

Unless you provide one of

- a simple VM implementation - code using a JIT framework - using Apache Impala on this particular dataset

which performs on par and didn't take astonishingly long to write then these are just vague claims and hypotheses.

barrkel · on June 17, 2019

Simple VM is about 30 minutes work.

A coding kata of mine is to implement an expression parser, evaluator, compiler and mini VM (a loop over an array with a switch, it's not hard at all) in 1 to 3 hours in new languages.

firethief · on June 16, 2019

A lot of people have experience with simple VMs. There's nothing super unique about this problem that requires every claim about it to be accompanied by a proof by construction.

comma_at · on June 17, 2019

Either you missed the context or are pulling a straw man. Building a simple VM is not unique, but can you build one that will be competitive with the other's in a reasonable timeframe?

firethief · on June 17, 2019

Yes, I could. Sorry, I wasn't trying to fight a straw man. I sincerely thought you were saying you wouldn't accept estimates of the project's difficulty in lieu of proof.

m00natic · on June 18, 2019

Implementing a "simple virtual machine" for a particular task is greenspunning[1]. Introducing a JIT library is work, complexity, debug and portability issues. LLVM compiles much slower to native code as the Clasp[2] guys notice comparing to SBCL. And in the end, any of these would be at most "competitive" to the simple Common Lisp implementation in speed (which is even portable across the different implementations).

[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

[2] https://github.com/clasp-developers/clasp

jessaustin · on June 15, 2019

TFA seems like a challenge to C++ fans, not to sketch out a better solution, but to actually run a better solution against the same data.

Veedrac · on June 15, 2019

I'm not sure if I'm missing something, or if this is meant to be allegorical for some other, more difficult problem, but the argument here seems very strange. For sure, C++'s string handling is an awful sight, but the jump to DSLs seems unmotivated. This issue can be handled with simple, traditional helper functions.

    select(recordS5, [](cxr, subcode, commercial_name, date_disc) {
        return cxr.like("YY|XX") &&
               ...etc;
    });

gumby · on June 15, 2019

A typical Lisp program of any size is an implementation of a library or language which reflects the natural way your domain would typically be expressed, which is then used to describe your solution.

I was so steeped in this model that I was surprised when I first heard the (relatively recent) expression “DSL”. I mean that’s one of the main values from abstraction.

m00natic · on June 18, 2019

It's allegorical in the sense that this is general Lisp technique useful not just in this case. The DSL is targeted at (non programmer) end users and supposed to be fired through a REPL. Wouldn't want to make them write C++ with lambdas, semi-columns and whatever syntax traps of the latest standard.

Veedrac · on June 18, 2019

That doesn't support the post's argument, that it's not about aesthetics, but about qualitatively simpler solutions.

> Often times I hear the claim that (programming language) syntax doesn't matter or if it does, it's only to provide some subjective readability aesthetics. It's somewhat surprising to me how large majority of supposedly rational people like programmers/computer scientists would throw out objectivity with such confidence. So let me provide a simple real world case where uniform syntax enables out of the box solution which is qualitatively simpler.

That making non-programmers use complex syntax and type semicolons is bad is fair, but it's a rather different claim than the post's.

m00natic · on June 18, 2019

Well, dynamic queries by end users is the main goal here. Your static helper functions are completely unusable in that context. Analysing a query and generating code at runtime is easy and idiomatic with uniform syntax (and accompanying language support) and the claim is that solution is not only speedier but implementation in CL is qualitatively simpler than the alternatives.

Veedrac · on June 19, 2019

The point my first comment made is that you don't need to analyse a query at runtime, you just need to provide functions. I will agree again that Lisp makes REPLs easier to write than does C++, but a CL with Lua syntax could still, just as easily, provide functions and expose a REPL. It's the same solution, just without the unjustified AST transformations.

m00natic · on June 19, 2019

How can you provide a REPL language without analyzing it at runtime? Write Lua random syntax in the REPL? Not a great improvement over C++. Not to mention that you'll probably use something like `eval` which is not compilation thus inferior.

By the way, even your original example - I can't see how it can work honestly. How can you identify fields through (lambda) parameter names only (no mention of types either)? Probably the least boilerplate-heavy solution would be stringly typed.

Veedrac · on June 19, 2019

A Lua REPL is hardly worse than a Lisp one.

> something like `eval` which is not compilation thus inferior

I honestly don't know what that means. Turning text into code is compilation; there is no difference between the two in that regard, except perhaps that in the Lisp DSL case it's more manual.

> How can you identify fields through (lambda) parameter names only (no mention of types either)?

Not familiar enough with Lua, but in Python you just use keyword arguments.

jaytaylor · on June 15, 2019

This looks super interesting, but very difficult to read on mobile with Chrome (Android, Samsung S10e):

https://i.imgur.com/bJw95FB.jpg

Rotating sideways helped to a degree, but in hopes the author sees this thread..

Update:

Sadly, Firefox isn't any better:

https://i.imgur.com/ONswSNb.jpg

I guess this means this instance can't chalked up to "Chrome is becoming the new IE6".

grin

TeMPOraL · on June 15, 2019

True.

Fortunately, there's a workaround - "request desktop site" in Firefox (and Chrome) menu.

jaytaylor · on June 15, 2019

Oh, wow, this is excellent, and fixes it!

I never knew about this option. Thank you, sincerely, TeMPOral!

TeMPOraL · on June 15, 2019

You're welcome. It's a life-saver, in cases like this, or when a mobile site disables the ability to pinch-zoom.

jan_g · on June 15, 2019

Pinch zooming can also be force-enabled for all sites in the browser's accessibility settings.

alanbernstein · on June 15, 2019

I agree that it's hard to read on mobile. I disagree that the problem is with the author. Some content is just not well suited for reading on a mobile device. I have a tendency to open links on my phone, then let them stay there for a week until I have a chance to open them on my laptop, where 95% of the things I want to do on a computer are easier and faster.

Here, my main issue is the code sections overflowing, the table of contents doesn't bother me. What would you suggest the author do about that?

saagarjha · on June 15, 2019

I disagree that it’s the content and not the website styling that makes this unreadable on mobile. A couple of small changes, namely reducing the margins and not having the code run off the side, would make this a lot nicer.

nudq · on June 16, 2019

If I remember correctly, Postgres was initially developed in Lisp, but then rewritten. Was that a mistake, or evidence against the thesis of this article?

_19qg · on June 16, 2019

Not really, the original Postgres was developed in a mix of 17000 lines of Lisp and 63000 lines of C. This was difficult to develop/debug at that time. Probably still would be.

It had a 'gigantic' memory footprint of 4 MB - the all-in-C version only used 1 MB. The Lisp version was also slower and they didn't use features like GC...

http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf

nudq · on June 16, 2019

> By a process of elimination, we decided to try writing POSTGRES in LISP. We expected that it would be especially easy to write the optimizer and inference engine in LISP, since both are mostly tree processing modules. Moreover, we were seduced by AI claims of high programmer productivity for applications written in LISP.

Yes, that's what I remembered, they started out using LISP.

> Our feeling is that the use of LISP has been a terrible mistake for several reasons.

"Terrible mistake" is pretty unambiguous language.

_19qg · on June 16, 2019

> Yes, that's what I remembered, they started out using LISP.

Only for the optimizer and the inference engine.

The authors also had no prior experience developing an application in a hybrid of C and Lisp.

> "Terrible mistake" is pretty unambiguous language.

4MB memory footprint was a terrible mistake at that time.

nudq · on June 16, 2019

They obviously started writing Postgres in LISP, because "we soon realized that parts of the system were more easily coded in C" wouldn't make sense if writing a hybrid had been the initial plan.

They tried going all LISP at first, and failed. Was it them, or was it LISP?

_19qg · on June 18, 2019

Since they had no experience in Lisp programming, they chose the wrong language just for 'doing something different'.

Writing a database in a performant way isn't something for a Lisp newbie.

"By the time Version 1 was operational, it contained about 17000 lines of LISP and about 63000 lines of C".

Version 1 was written in a mix of C and Lisp.

That's also not surprising, since that would have been a common approach for some technical reasons. But it's a bit difficult to do - again, especially as a newbie.

> Was it them, or was it LISP?

Their lack of experience, their approach, the LISP implementation they were using, the hardware constraints (4 MB footprint was not acceptable to them), ... A conservative approach using a lower-level systems programming language like C was a good choice at that time and they were much more successful with that approach.

There were/are a bunch of databases written in Lisp and even in a mix of C and Lisp: Statice (Symbolics), Zeitgeist (TI), Orion/Itasca, AllegroStore (Franz), ... But they were written by Lisp experts.

firethief · on June 16, 2019

Well they were using LISP without GC because they couldn't afford stop-the-world pauses and concurrent GC didn't exist yet. If it was LISP, the several aspects of LISP they describe having trouble with are all obsolete considerations.

GorgeRonde · on June 16, 2019

It is not that simple