You want to be concrete? OK, literally every real-world SQL setup I've seen, whe...

alex_lav · on June 29, 2023

> - Has no way to deploy a specific historical version of the SQL (e.g. deploy this git tag)

Can you clarify this? Do you mean the version of a specific schema/query?

> - Has no automated checking that the SQL behaves as expected (e.g. this query with this test data should produce this output; if it doesn't, the git tag will not be created). Even checking that the SQL is syntactically valid is rare.

This is part of the purpose of transactions [0]

> - Has no reuse of expressions smaller than a view/table

Incorrect [1][2][3]

> - Has not even basic type checking, e.g. something that alerts if you are combining an expression that might produce null with an expression that does not handle null well

Incorrect [4] [5]

> - Has not even basic library/dependency management

Incorrect [6]

> - Has no practical structured values (since none of the infrastructure that would make using ad-hoc temporary tables safe exists). CTEs are an improvement but still thoroughly noncompositional since you can only put them in one place.

Can you clarify? How is a transaction + CTE/subquery not sufficient?

> Let me guess, "these are solved problems" but no details of what you do to solve them, because none of the solutions actually work.

Replace "none of the solutions actually work" with "I don't know how to use them", and yes you're correct.

I'm happy to answer more of your questions in good faith, but I'm not really interested in debating a person with their head in the sand. Please let me know if I can help you, if you're actually interested. But also, as I've said, it's also fine to not want/not like this type of stuff. Different strokes and all that.

[0] https://www.tutorialspoint.com/sql/sql-transactions.htm

[1] https://www.postgresql.org/docs/current/sql-createfunction.h...

[2] https://www.postgresql.org/docs/current/sql-createprocedure....

[3] https://www.postgresql.org/docs/current/datatype-enum.html

[4] https://www.postgresql.org/docs/current/functions-conditiona...

[5] https://www.postgresqltutorial.com/postgresql-tutorial/postg...

[6] https://www.postgresql.org/download/products/6-postgresql-ex...

lmm · on June 29, 2023

> Can you clarify this? Do you mean the version of a specific schema/query?

I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy. A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).

> This is part of the purpose of transactions [0]

Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too. Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.

> Incorrect [1][2][3]

Postgresql functions with composite values (and more generally the fact that composite values exist at all) sound like exactly what I was looking for, so that would be a big improvement if I could use them (although for the record they're not a standard SQL feature; MySQL functions can only return a scala value, so there's a major missing middle between functions and views). But even then they have the same problem as temporary tables/views of existing "globally", in the place you'd expect data rather than code to be, and deployment/use tooling being inadequate to use them safely (or at least widely perceived as such). Like, everywhere I've worked has had a de facto rule of "no DDL on the production database except for deliberate long-term changes to the schema that have gone through a review process", and I don't think that's unreasonable (maybe you do?).

> Incorrect [4] [5]

Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.

> Incorrect [6]

A list of libraries != library/dependency management.

> Can you clarify? How is a transaction + CTE/subquery not sufficient?

If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or.... The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).

alex_lav · on June 29, 2023

> I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy.

It's tough to follow these posts in part because SQL is a programming language used to interract with a database. It seems like you want to treat the SQL you write synonymously with the system you're writing it against, which is...I don't know...confusing? SQL is declarative, you say make it so and the underlying engine makes it so. I've had trouble parsing whether you dislike _writing SQL to interract with a database_ or _the way RDMS systems manage data_. The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so. You're free to store your SQL queries however you'd like, just like any other programming language.

> A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).

As is the nature of this conversation, this is just inherently not how these systems are intended to work. Asking for "new data but old shape" is legitimately ridiculous.

> Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too.

I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.

> Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.

Sure, and in database systems this is the same. Craft a query out of a transaction, when it does what you want, commit it (either via a transaction or to normal source control). Everything you're asking for literally already exists, exactly how you're asking for it.

> Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.

The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time. I didn't bring it up because it's so obvious and common I thought there was some other misunderstanding.

> A list of libraries != library/dependency management.

I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react". Again, what you're asking for literally exists exactly how you're asking for it.

> If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or....

If you open a java file, are you allowed to write arbitrary code wherever you want? No, that would be ridiculous. Enforcing some shape or structure on source is common and expected.

> The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).

I'm happy to meet you in the middle (really, one millimeter from where I currently stand) and acknowledge that it can be at times frustrating to modify a SELECT portion of a query prior to getting to the FROM clause, which reduces the ability for editors to assist in typeahead (because as you've said, you haven't gotten to the FROM yet).

lmm · on July 7, 2023

> The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so.

Well, complex operations on data - even if those operations are just selection and aggregation - require logic. So either that logic lives in SQL, or it lives in application code.

> I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.

So where is the equivalent of npm/cargo/maven? And where is the unit testing framework? Transactions are low-level functionality that you could build this tooling on top of - but as far as I can see no-one has, or at least not to the extent that it's standardized and accepted in the community. Where in RDBMS-land can I check out an existing project, make a small edit to one of the queries, and then run that project's tests to confirm I haven't broken it? A few projects have some support tools for doing this, but they're inevitably ad-hoc and unpolished.

> The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time.

I want a project with a bit more structure than a single file. And I want to share and reuse pieces between multiple projects rather than writing everything from scratch every time. Again, that's the low-level functionality, but where is the workflow tooling that actually builds on that to let you do day-to-day things in a standardised way?

> I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react".

What? Where? You linked to a list for postgresql that literally has 12 packages available, total (I'm pretty sure I've published more packages than that in Maven central myself).

> If you open a java file, are you allowed to write arbitrary code wherever you want?

Not quite, but I can select any subexpression of an expression almost anywhere and pull it out into either a local variable or a function - usually by doing nothing more than hitting a key combo in my IDE. I haven't found anything like that for SQL.