> The proper way to build a data layer is one query at a time, as a string, with...

kentm · on June 27, 2023

Charitably, they probably meant the process by which the parameterized query string itself is built, in the case where you need a dynamic query structure.

hinkley · on June 28, 2023

I swear to god I'm going to write a SQL library some day that doesn't take strings as arguments. Although LINQ kinda already does what I'm thinking about, so maybe I just wait for someone to port it.

intrasight · on June 28, 2023

The updated SqlQuery function of .NET 8 is pretty cool. It takes a string but safely interpolates the parameters.

https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-cor...

https://timdeschryver.dev/blog/you-can-now-return-unmapped-t...

"With the new SqlQuery method, you can now write your own SQL queries and map the result to any type that you want.

This is useful for tasks that are not directly related to the day-to-day business of your application. For example, this can be used to import data, migrate data, or to retrieve data from a legacy system."

throwaway2990 · on June 28, 2023

Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?

intrasight · on June 28, 2023

It's not just interpolation. The interpolation handler rewrites the query and makes it safe from SQL injection.

There's an add-on to Dapper that does similar:

https://github.com/mishael-o/Dapper.SimpleSqlBuilder

useerup · on June 28, 2023

> Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?

Pretty much. C# does some really cool innovations around string interpolation which proves very useful for logging, SQL query etc.

You can create your own string interpolation handler, which is what the new SQL query does and what several log libraries do. So basically you can use interpolation safely and remain assured that you do not introduce SQL injection errors. Also, this way string interpolation does not prevent query plan caching, as normal string interpolation would do.

mkleczek · on June 28, 2023

Not sure what language you are talking about. For Java there is an awesome https://www.jooq.org/doc/3.19/manual-single-page/

Having said that: I stopped using it as writing plain SQL seems much more effective.

jokab · on June 28, 2023

If you are fine with search and replace to refactor your code then ok.

funcDropShadow · on June 28, 2023

Or you could use an IDE that supports refactoring SQL, like Intellij IDEA.

nhumrich · on June 28, 2023

You can do it in typescript/es6 using tagged template literals. I have done it before and it was awesome. Seriously wish more languages supported custom variable interpolation for it.

codethief · on June 28, 2023

Personally, I really hope that https://github.com/PRQL/prql will see some language support (similarly to JSX/TSX).

JoelSanchez · on June 29, 2023

https://github.com/seancorfield/honeysql

missingdays · on June 28, 2023

Exposed exists

geraldwhen · on June 28, 2023

You can’t parameterize a table name and possibly also column name in Postgres.

hinkley · on June 28, 2023

Those parts are perfectly fine as string constants.

nhumrich · on June 28, 2023

Depends on how dynamic your table names are

hinkley · on June 28, 2023

APIs get weird when you gimp the common path in order to support relatively obscure use cases.

The common things should be easy.

javajosh · on June 27, 2023

Details. But sure. In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.

FridgeSeal · on June 27, 2023

Having been on the team that inherited a project full on stored procedures (more than once): no thank you, not ever again.

Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly. I’m sure they could be done “less worse”, but I doubt they can be done well.

noisy_boy · on June 28, 2023

Thanks for echoing my pain. I am paying for the bad decisions taken by a bloke who only knew SQL and used that hammer to bang on every nail he could find. Everytime someone asks why the result of the stored proc is the way it is, I need to manually run the steps one by one until I find the point where an expected row failed a condition and dropped off. And have extra fun wrapping those statements with declarations incase you are heavily using declared variables in those statements. To top it off, if you are troubleshooting in an environment where you don't have insert/update permissions, you day just became amazing. Fuck you to those who use stored procs without strong justification (like high volume data processing on database side where it totally makes sense).

By version control, I assume you mean the inability to use different versions simultaneously without hacks, unlike a library built using Java where you get to pick and choose the version you want to use. Because of you mean version control of stored proc sources, that would be just like a normal git asset.

Canada · on June 28, 2023

I like stored procedures, or I guess to be more specific functions in postgres, but they have to be done in a really specific way:

1) It's all or nothing. If there's functions then they better feel like a complete high level API, enforcing the consistency of data, perfectly handling locking, etc. If there's tons of complex SQL in the app and then also a some functions then it's better to have no functions at all.

2) They need to have a consistent interface, such as always returning an error flag, error message if applicable. Functions must never surprise the caller. It either succeeds and the caller commits, or fail and the caller rolls back and knows why.

3) No monkey business with triggers. They can maintain updated_at columns, or maybe create some audit trails, but anything beyond that will make behavior surprising to callers and that is the worst.

As for version control it needs to be maintained as a separate application, which the server depends on having certain versions of. Even if you don't use functions you have to worry about schema versions and you can't always run two versions at the same time or roll back schema changes easily as a practical matter.

afavour · on June 28, 2023

Holy crap yes. This thread has triggered a long lost memory from over a decade ago where everything, absolutely everything was done through stored procedure. Such a wild, wild waste of time.

westurner · on June 28, 2023

Stored procedure: https://en.wikipedia.org/wiki/Stored_procedure

Are there SQL migration tools like Alembic and South that also version and sync Stored Procedures? (Alembic on "Replaceable objects" and schema: https://alembic.sqlalchemy.org/en/latest/cookbook.html#repla... , alembic_utils: https://github.com/olirice/alembic_utils )

From "What ORMs have taught me: just learn SQL (2014)" https://news.ycombinator.com/item?id=15949610 :

> ORMs:

> - Are maintainable by a team. "Oh, because that seemed faster at the time."

> - Are unit tested: eventually we end up creating at least structs or objects anyway, and then that needs to be the same everywhere, and then the abstraction is wrong because "everything should just be functional like SQL" until we need to decide what you called "the_initializer2".

> - Can make it very easy to create maintainable test fixtures which raise exceptions when the schema has changed but the test data hasn't.

ORMs may help catch incomplete migrations; for example when every reference to a renamed column hasn't been updated, or worse when foreign key relations changed breakingly.

django_rest_assured generates many tests for the models (SQL)/views/API views in a very DRY way from ORM schema and view/controller/route registrations, but is not as fast as e.g. FastAPI w/ SQLalchemy or django-ninja.

lyu07282 · on June 28, 2023

I would add impossible to test to that list. And the postgraphile people think it's the new hot stuff smh

re-thc · on June 28, 2023

> Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly.

Traditionally, maybe. With databases like Neon (postgresql) and Planetscale (mysql) supporting easy branching / snapshot it's at least made this a lot nicer.

FridgeSeal · on June 28, 2023

That might be true, but that’s like saying “now that my mechanic can repair my car for free, it’s way less worry to replace my radiator by removing it with a chainsaw”.

The solution to stored procs being awful isn’t making a whole branch just to see what it does, it’s fixing the problems with stored procedures at the root level.

grugagag · on June 28, 2023

Though im on the fence with stored procs and have seen complicated messes that make it depressingly no fun to work with I also had very good experiences with systems based around stored procs, not in an abusive dogmatic way. The usual answer I’m going to invoke here is that it depends how the tool is used. Any tool, methodology, philosophy can be borked in various ways, that experience traumatize people that they prefer to move on. Of course some tools are clearly worse or less useful than others but you can find some consensus if you know where to look.

mmcclure · on June 27, 2023

I (softly) disagree with stored procs being definitively the "best way to do the data layer." Stored procs are an extremely powerful tool to have in your belt, but lord have mercy do they have their own tradeoffs.

xarope · on June 28, 2023

As usual there's a time and place.

What I've found is that devs tend to be able to write "OKish" SQLs, but fall over when trying to optimize them, so the DB side can then take over and write optimized stored procs, which can have a separate cadence of code updates and version control compared to the backend team.

JohnMakin · on June 28, 2023

some of the worst issues I’ve ever dealt with had one of these as a root cause

paulryanrogers · on June 27, 2023

Stored procedures often increase CPU load of the DB instances, deploys can be more challenging, and security is only better if one takes care how the procedure is authorized or delegated.

They can save some network round trips, and unify interfaces if there are a lot of different stacks sharing one data store. Though it's not universally better.

JohnBooty · on June 28, 2023

    Stored procedures often increase CPU load of the DB instances

This is of course true if you're doing a bunch of computational work in your stored procedure (as opposed to just doing pure CRUD stuff) but I'm struggling to think of a real world situation where this might be the case.

Can you name some examples you've encountered? (I'm not doubting you, I'm just curious)

paulryanrogers · on June 28, 2023

I did some complex expiration calculations in a stored function, it cut down on round trips and kept the app simpler. Though it did cause some modestly higher CPU load.

I've also seen MVs cause CPU spikes where there is a lot of SELECT massaging going on. Even without the MVs themselves certain functions like REGEXP stuff can impact performance. Language of the PL matters too, if you've got choices like Python.

JohnBooty · on June 28, 2023

Sorry, MVs? I'm probably going to feel like an idiot but I'm not recognizing this acronym.

Also thanks for sharing your example, much appreciated

noisy_boy · on June 28, 2023

Materialised Views.

JohnBooty · on June 28, 2023

Ah! Thanks. I'm not sure why my brain wasn't making the connection.

Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.

A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.

noisy_boy · on June 28, 2023

> Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.

They are great if the frequency of changes isn't very high and the underlying tables are read-heavy and running it directly against the tables would be expensive (e.g. complex joins).

> A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.

Typically Redis is used in medium to high volume of many smallish but repetitive queries where you want to store some key bits of information (not entire records at which point going to database might be simpler) that are sought often and you don't want to hit the database again and again for them - a bit different from the materialized view scenario.

webprofusion · on June 28, 2023

False. I've developed enormous systems fully using stored procs for data layer and it's absolutely not the "best way to do the data layer". Next you'll be saying triggers are a good idea.

Yes they can be fast, and they're controlled by the DBA, that's why you'd choose them. Injection is not an issue in modern data access layers, that's a 1999 problem.

edandersen · on June 28, 2023

If your default stance in 2023 is that you should start with stored procedures, you are doing something very wrong and you are a risk to the survival of your business.

grugagag · on June 28, 2023

Really depends on what you need, depends on size, team structure, budget, and so many other things.

JohnBooty · on June 28, 2023

I'd say that any kind of absolutist "default stance" is suspect.

preseinger · on June 28, 2023

an application (client) should be free to query a database (server) directly

a stored procedure is an implicit dependency between client and server, fine as an optimization, but definitely not what you want to do by default

grugagag · on June 28, 2023

For CRUD stuff if you don’t care about execution plan cache and other optimizations it could save you some time sure to query directly. For chunks of code that transactions/procesing large data directly on the server I’d reach out for stored procedures without thinking too much

preseinger · on June 28, 2023

it's not like stored procedures are inherently faster than normal queries, right?

as long as you're not doing anything dumb like connection-per-request, query caching should work the same

JohnBooty · on June 28, 2023

    it's not like stored procedures are inherently 
    faster than normal queries, right?

They're doing the same amount of "work" with regards to finding/creating/updating/deleting rows.

But you (potentially) avoid shuffling all of that data back and forth between your DB server and your app.

This can be an orders-of-magnitude benefit if we are describing a multistep process that would involve lots of round trips, and/or involves a lot of rows that would have to be piped over the network from the DB server to the app.

Suppose that when I create an order, I want to perform some other actions. I want to update inventory levels, calculate the user's new rewards balance, blah blah blah. I could do all of that in a single stored procedure without bouncing all of that data back and forth between DB and client in a multistep process. That could matter a lot in terms of scalability, because now maybe I only have to hold that transaction lock for 20ms instead of 200ms while I make all of those round trips.

There are a lot of obvious downsides to using stored procedures, but they can be very effective as well. I would not use them as a default choice for most things but they can be a valuable optimization.

preseinger · on June 28, 2023

huh?

a single request returns a single result set to the client, whether it's a stored procedure or a direct query

and any stored procedure can be equivalently expressed as a direct query, right?

grugagag · on June 28, 2023

For example you can look up millions of rows then manipulate some data, aggregate some other data and in the end return a result set without shuffling back and forth client/server.

preseinger · on June 28, 2023

you can do that equally well in a stored procedure and a single query

like, you can write a query which does all of these transforms in sequence, and returns the final result set

the data that goes between client and server is only that final result set, it's not like the client receives each intermediate step's results and sends them back again?

setr · on June 28, 2023

If you’re going to mix multiple queries with procedural logic — eg running query A vs B depending on whatever conditions based on query C, then a stored proc saves you the round trips versus doing it in your app code. That’s all he’s saying.

preseinger · on June 28, 2023

it doesn't! whatever code you put into the stored proc you can equally well put into a query, and the round-trip costs would be equivalent

a stored proc is just a query saved on the db server, nothing more

if you destructure a stored proc to multiple individual queries, ok, sure, but who would do that?

JohnBooty · on June 28, 2023

It seems you don't have a lot of experience or understanding regarding stored procedures.

Obviously if you just take a single query and turn it into a stored procedure then yes, the round-trip cost is the same. This seems to be where your knowledge ends. Perhaps we can expand that a bit.

Let's look at a more involved example with procedural logic. This would be many, many round trips.

https://www.red-gate.com/simple-talk/databases/sql-server/t-...

I'm not exactly endorsing that example. Personally, I would almost never choose to put so much of my application logic into a stored procedure, at least not as a first choice. This is just an example of what's possible and not something I am endorsing as a general purpose best practice.

With that caveat in mind, what's shown there is going to be pretty performant compared to a bunch of round trips. Especially if you consider something like that might need to be wrapped in a transaction that is going to block other operations.

Also, while you may be balking at that primitive T-SQL, remember that you can write stored procedures in modern languages like Python.

https://docs.snowflake.com/en/sql-reference/stored-procedure...

preseinger · on June 28, 2023

good lord man, the condescension is so thick and rich, it's like i'm reading an eclair, and the eclair is insulting me based on its own misunderstanding of the topic of conversation

powerful stuff

JohnBooty · on June 28, 2023

That'll happen when you're publicly and confidently wrong. If you scroll up, you'll find any number of posts where folks pointed out your misconceptions in a more kindly fashion. When we factor in the fact that you're wrong (as opposed to my post which is correct, informative, and cites examples) I think many would say your incorrect assertions are a lot ruder and less HN-worthy.

Sometimes folks know more about a given thing than you do. That is okay. The goal is to learn. I am sure you know more than I do about zillions of things. In fact, that is why I come here. People here know things.

preseinger · on June 28, 2023

haha, man, the absolute _chutzpah_ you need to make (incorrect) accusations like this, even as an anonymous person on the internet, is really breathtaking

i hope you reflect on this interaction at some point

JohnBooty · on June 28, 2023

You seem to be firm in your objectively wrong belief that:

    a stored proc is just a query saved on the db server, nothing more

Absolutely not. They can contain procedural logic as well. You can do a wide range of things in a stored proc that are far beyond what can be done with a query. Again.... I provided some links with examples. You don't need to believe me.

    i hope you reflect on this interaction at some point

Wow.

bandrami · on June 28, 2023

If you're just writing a single access app for a DB I agree it doesn't really matter much. Where SPs really help is when you're maintaining a DB that multiple projects from multiple teams access and you can present them a single high-level API to use, so that Accounting and Fulfillment aren't reinventing the same wheels over and over. So it's more about stack management than anything else.

funcDropShadow · on June 28, 2023

Not, if you consider inserts, updates and deletes in addition to selects.

Not, if you want to spread the queries over multiple transactions.

wernercd · on June 28, 2023

My first thought is to scream in horror as Bobby Tables says hi to "string interpolation". complaining about evil ORMs and then just asking to get taken to the cleaners...

FridgeSeal · on June 28, 2023

I really feel like a good chunk of the discussions around ORM/not ORM and such are because the core method of interacting with our databases sucks.

Passing strings around is an awful interface, and maybe if we explored the space to fix that, a good chunk of these subsequent problems would be obviated.

IceSentry · on June 28, 2023

Some libraries can use string interpolation to create parameterized queries automatically.

krzyk · on June 28, 2023

Proper string interpolation is not (see JEP 430).

kdazzle · on June 28, 2023

Man, you know what you can really just mostly ignore when you’re using an ORM?

Injection

If i’m building an app or an API, I will ORM till I die (as long as it’s Django). If I need anything so much as a groupby, though, i will drop right into SQL or create an aggregate table that my ORM can just do a basic SELECT from

jkaplowitz · on June 28, 2023

If you’re just worried about injection, you just need to use bound parameters instead of string interpolation. Boom, avoided at the database driver level even with plain sql. I admit though that some cases of string interpolation can be harder to catch in your code than when using ORMs.

masklinn · on June 28, 2023

Ah yes, the git gut method of software engineering, an endless font of teaching opportunities from being a bottomless well of fuckups.

Especially unhelpful when the dynamic sql is legit.

Capricorn2481 · on June 28, 2023

Humans make mistakes and we can't rely on them not to make mistakes in software development, yes. But preventing SQL injection is not really an "oopsie.". It's a fairly easy thing to spot and prevent, and none of the code I've ever reviewed had a SQL injection bug in it because string interpolation looks completely different from parametrized queries.

I have no problems with ORMs, but you make it sound like it's as easy for a SQL injection bug to go into production code as a memory bug in C. You'd have to actively be fucking up at that point, and if a Junior dev does it, it's a 5 minute talk and they'll never do it again.

jkaplowitz · on June 28, 2023

I didn’t say you had to be magic about it in the way of git gut, no.

One can for example hook up a static analyzer to your GitHub and configure it to complain if there are any sql calls that can’t be automatically verified as free from interpolation of potentially unsafe data unless those dynamic calls are marked with a comment annotation as having been individually verified by a human as definitely not interpolating potentially unsafe data. I suspect such an analyzer may already exist as a SaaS offering.

If nobody lies about those annotations, a static analyzer can handle the common cases. If you have a human faking such annotations, whichever motivation causes them to do that would create other security issues even with the ORM.

And the analyzer could even handle some kinds of dynamic sql depending on how aware it is of your programming language and type system and what markings you’re willing to apply to functions and/or data with potentially unsafe and/or safe origins. Being vague only in the service of generality here since the specifics vary so much by company.

If there is no SaaS service for this, maybe one ought to exist.