I like it, but I'm not sure it's worth the trouble of further expanding an ever-...

harrall · 2025-02-13T15:47:54 1739461674

Plain transpiling won’t work because when you write a SQL query, you have to know what engine is running it because they all do worse at some things than others.

For example, the same result could be fetched using a correlated subquery, a common table expression, a temporary table, a view or very hackily using aggregation and it would depend on whether you were using SQL Server, Postgres, MySQL, or SQLite because they don’t do it all fast. …and you need to know the version.

This might be able to be solved if the language was a brand new higher level one though and the compiler knew the intricacies of each engines.

jmull · 2025-02-13T17:23:29 1739467409

Yes, any such transpiler would certainly need to be engine aware (for multiple reasons). Right off the bat, the output needs to be in a syntax the engine accepts, and there are all kinds of differences from engine to engine, version to version.

And, as you say, if the language contains higher-level abstractions, it might need to work quite differently depending on the target engine.

My thought is to avoid those higher-level abstractions as much as possible and just expose the capabilities of the engine as directly as possible, albeit with a different syntax. In my experience, developers who are willing and able to write SQL are fine with targeting a specific engine and optimizing to that. (Those that aren't get someone else to do it, or live with he sad consequences of dealing with an ORM.)

To summarize:

Normal Approach: you pick an engine, and get the syntax that comes with it. You need to know what the engine does well and doesn't do well. You write SQL accordingly, using the syntax the engine accepts.

Transpiler Approach: you pick an engine, and independently choose a syntax. You still need to know what the engine does well and doesn't do well. You still write SQL accordingly, but using the syntax of the language you chose.

rednafi · 2025-02-13T18:55:38 1739472938

I don’t know if the transpiler approach really helps when you still have to be aware of the underlying database engine. I’m fine with choosing something like PostgreSQL and writing code optimized for its specific SQL dialect.

But I’m not sure I want to deal with a transpiler that still suffers from a leaky abstraction. It starts to feel a lot like using an ORM. ORMs promise database-agnostic syntax across different engines, but that promise falls apart when you need to optimize a query and suddenly realize you must account for engine-specific features—like PostgreSQL’s BRIN indexes.

jmull · 2025-02-13T20:35:24 1739478924

To me, abstracting away the engine is not a goal. (Generally, it's pointless -- the various engines work in different ways and offer different features. While there's a bunch of stuff that widely common, that's also the stuff you don't really need to abstract.)

The point of transpiling is to allow you to pick a syntax independent of the engine. You still have to pick the engine and will deal with its peculiarities.

Today's post is about pipe syntax, which is syntax. Yesterday's was about trailing commas, which is syntax. I think there's an appetite for this.

gav · 2025-02-13T15:55:56 1739462156

One example is LookML, which used to build semantic data models in an analytic layer: https://cloud.google.com/looker/docs/what-is-lookml

anon84873628 · 2025-02-13T19:07:14 1739473634

LookML does seem to have invested a lot in compilation to different SQL backends, generally using the best syntax for each.

Unfortunately the frontend is so tightly tied to the Looker BI stuff, and it can't really express arbitrary queries without going through lots of hassle.

Its unclear what Google is doing with Looker. Its would be interesting to imagine what LookML would be like as an independent SQL tool chain.

leoh · 2025-02-13T21:19:17 1739481557

>LookML does seem to have invested a lot in compilation to different SQL backends, generally using the best syntax for each.

To some degree, yes. Yet far and away, users of Looker use engines like RedShift, BigQuery, and Snowflake because they’re extremely effective at the types of queries that Looker sends at them — not because Looker spends a huge number of hours optimizing for each engine (that’s not to say none is done); these dbs are great at analytical queries.

Looker in its earlier days (early/mid 2010s) took a bet on analytical database engines getting better as opposed to other technologies; for example, Tableau had its own datastore and largely did not “push queries down to the database” for execution. In the end, BigQuery was radically faster than SparkSQL and was compelling for customers, for example; it was not that Looker spent a ton of time optimizing BigQuery as opposed to SparkSQL.

Source: I was an early engineer at Looker

amcaskill · 2025-02-13T21:09:48 1739480988

The creator of LookML is actually working on that with a OSS project called Malloy.

https://www.malloydata.dev/

crazygringo · 2025-02-13T17:00:45 1739466045

It's also not just that -- it depends on the data.

A database can change how it decides to execute a query based on whether a table has 100 rows or 10,000,000 rows.

harrall · 2025-02-13T17:06:12 1739466372

Yeah that's true. Also, you, as the query writer, know how much the table has and will have and sometimes you decide what to do based on that information too.

jamesblonde · 2025-02-13T19:14:09 1739474049

We transpile from Python to SQL using Calcite. Calcite works well as an engine that knows enough SQL dialects to be practical.

leoh · 2025-02-13T21:17:25 1739481445

+1 for Calcite. Any more details on what you’re doing? Are you using a python library on top of calcite? Something else?

jamesblonde · 2025-02-15T12:05:26 1739621126

We run Calcite in the backend. Python library sends DSL as json to the backend that returns SQL for the target engine.

michaelmior · 2025-02-13T15:51:04 1739461864

It seems like this is solved by just having the transpiler aware of the target system.

leoh · 2025-02-13T21:12:31 1739481151

This is nonsense and the person who was advocating for implementation-aware transpilation is completely wrong.

It is the role of a database engine implementation to service useful queries. Sometimes those queries are very complex; and yes, pipe syntax allows users to more easily write queries that can challenge database performance.

Yet this totally misses the point. Technologies like LookML have long allowed for extremely useful yet complicated queries to be executed by a database engine (often against one that is exceptional at complex analytical queries, like BigQuery) with high performance.

We should never handicap a SQL user’s ability to do what they want in the most ergonomic way. Transpilation largely allows for this — and I am 100% certain that implementations that allow for pipe syntax will effectively merely transform an AST with pipes to an existing kind of AST (this is, of course, effectively transpilation).

It is the job of the database engine, not the one querying (within reason) to performantly execute a query — and ditto for a pipe->non-pipe SQL transpiler. If you disagree with this, you are ignoring the fact that BigQuery, Snowflake, Redshift, and other optimized analytical databases are collectively worth billions, largely for their ability to performantly service crazy yet useful queries.

_dark_matter_ · 2025-02-13T15:14:27 1739459667

There are a lot of transpilers, and they don't get much adoption because the people implementing them are not the people implementing the database.

The database and the query language are tightly coupled. The latter determines the kinds of optimizations that the former can do, and a transpiled language means that optimization info needs to be passed through in weird and inefficient ways; it also means that obvious optimizations aren't possible, because the people building the language are not building the db.

If you've ever tried used query hints, or deeply inspected a distributed query plan, you'll know exactly what I mean.

9rx · 2025-02-13T15:26:26 1739460386

> A transpiling approach makes more sense to me.

That's the solution we've been using, oft referred to as ORM, to work around the lack of this syntax. But then you had to live in a world where you had an almost SQL, but not really, language that doesn't always quite map perfectly to SQL and all the pain to go along with that.

Now instead of that song and dance you can compose queries with plain SQL and simple string concatenation. That is a huge win, even if troubling that it has taken so long (and still not widely supported).

jmull · 2025-02-13T17:31:38 1739467898

Generally, ORM's try to present a high-level abstraction of your data and data model than the engine does natively.

That makes it the ORM's responsibility to implement the abstraction in a good way. Problems come in when the "good" way depends on factors the ORM doesn't consider or expose well.

I actually mean transpiler in a mainly syntactic sense, and would want it to avoid higher-level abstractions (after all, as you point out, ORMs have that ground covered).

9rx · 2025-02-13T18:33:41 1739471621

While that is closer to ORM in the classical sense, the modern use – and why I used 'oft referred to' – is more accurately a query builder, but get called ORM regardless. The reason people reach for this type of tool is because there hasn't been a good solution to this with SQL proper, necessitating another language. Until now.