Charitably, they probably meant the process by which the parameterized query string itself is built, in the case where you need a dynamic query structure.
I swear to god I'm going to write a SQL library some day that doesn't take strings as arguments. Although LINQ kinda already does what I'm thinking about, so maybe I just wait for someone to port it.
"With the new SqlQuery method, you can now write your own SQL queries and map the result to any type that you want.
This is useful for tasks that are not directly related to the day-to-day business of your application. For example, this can be used to import data, migrate data, or to retrieve data from a legacy system."
Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
> Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
Pretty much. C# does some really cool innovations around string interpolation which proves very useful for logging, SQL query etc.
You can create your own string interpolation handler, which is what the new SQL query does and what several log libraries do. So basically you can use interpolation safely and remain assured that you do not introduce SQL injection errors. Also, this way string interpolation does not prevent query plan caching, as normal string interpolation would do.
You can do it in typescript/es6 using tagged template literals. I have done it before and it was awesome. Seriously wish more languages supported custom variable interpolation for it.
Details. But sure. In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.
Having been on the team that inherited a project full on stored procedures (more than once): no thank you, not ever again.
Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly. I’m sure they could be done “less worse”, but I doubt they can be done well.
Thanks for echoing my pain. I am paying for the bad decisions taken by a bloke who only knew SQL and used that hammer to bang on every nail he could find. Everytime someone asks why the result of the stored proc is the way it is, I need to manually run the steps one by one until I find the point where an expected row failed a condition and dropped off. And have extra fun wrapping those statements with declarations incase you are heavily using declared variables in those statements. To top it off, if you are troubleshooting in an environment where you don't have insert/update permissions, you day just became amazing. Fuck you to those who use stored procs without strong justification (like high volume data processing on database side where it totally makes sense).
By version control, I assume you mean the inability to use different versions simultaneously without hacks, unlike a library built using Java where you get to pick and choose the version you want to use. Because of you mean version control of stored proc sources, that would be just like a normal git asset.
I like stored procedures, or I guess to be more specific functions in postgres, but they have to be done in a really specific way:
1) It's all or nothing. If there's functions then they better feel like a complete high level API, enforcing the consistency of data, perfectly handling locking, etc. If there's tons of complex SQL in the app and then also a some functions then it's better to have no functions at all.
2) They need to have a consistent interface, such as always returning an error flag, error message if applicable. Functions must never surprise the caller. It either succeeds and the caller commits, or fail and the caller rolls back and knows why.
3) No monkey business with triggers. They can maintain updated_at columns, or maybe create some audit trails, but anything beyond that will make behavior surprising to callers and that is the worst.
As for version control it needs to be maintained as a separate application, which the server depends on having certain versions of. Even if you don't use functions you have to worry about schema versions and you can't always run two versions at the same time or roll back schema changes easily as a practical matter.
Holy crap yes. This thread has triggered a long lost memory from over a decade ago where everything, absolutely everything was done through stored procedure. Such a wild, wild waste of time.
> - Are maintainable by a team. "Oh, because that seemed faster at the time."
> - Are unit tested: eventually we end up creating at least structs or objects anyway, and then that needs to be the same everywhere, and then the abstraction is wrong because "everything should just be functional like SQL" until we need to decide what you called "the_initializer2".
> - Can make it very easy to create maintainable test fixtures which raise exceptions when the schema has changed but the test data hasn't.
ORMs may help catch incomplete migrations; for example when every reference to a renamed column hasn't been updated, or worse when foreign key relations changed breakingly.
django_rest_assured generates many tests for the models (SQL)/views/API views in a very DRY way from ORM schema and view/controller/route registrations, but is not as fast as e.g. FastAPI w/ SQLalchemy or django-ninja.
> Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly.
Traditionally, maybe. With databases like Neon (postgresql) and Planetscale (mysql) supporting easy branching / snapshot it's at least made this a lot nicer.
That might be true, but that’s like saying “now that my mechanic can repair my car for free, it’s way less worry to replace my radiator by removing it with a chainsaw”.
The solution to stored procs being awful isn’t making a whole branch just to see what it does, it’s fixing the problems with stored procedures at the root level.
Though im on the fence with stored procs and have seen complicated messes that make it depressingly no fun to work with I also had very good experiences with systems based around stored procs, not in an abusive dogmatic way. The usual answer I’m going to invoke here is that it depends how the tool is used. Any tool, methodology, philosophy can be borked in various ways, that experience traumatize people that they prefer to move on. Of course some tools are clearly worse or less useful than others but you can find some consensus if you know where to look.
I (softly) disagree with stored procs being definitively the "best way to do the data layer." Stored procs are an extremely powerful tool to have in your belt, but lord have mercy do they have their own tradeoffs.
What I've found is that devs tend to be able to write "OKish" SQLs, but fall over when trying to optimize them, so the DB side can then take over and write optimized stored procs, which can have a separate cadence of code updates and version control compared to the backend team.
Stored procedures often increase CPU load of the DB instances, deploys can be more challenging, and security is only better if one takes care how the procedure is authorized or delegated.
They can save some network round trips, and unify interfaces if there are a lot of different stacks sharing one data store. Though it's not universally better.
Stored procedures often increase CPU load of the DB instances
This is of course true if you're doing a bunch of computational work in your stored procedure (as opposed to just doing pure CRUD stuff) but I'm struggling to think of a real world situation where this might be the case.
Can you name some examples you've encountered? (I'm not doubting you, I'm just curious)
I did some complex expiration calculations in a stored function, it cut down on round trips and kept the app simpler. Though it did cause some modestly higher CPU load.
I've also seen MVs cause CPU spikes where there is a lot of SELECT massaging going on. Even without the MVs themselves certain functions like REGEXP stuff can impact performance. Language of the PL matters too, if you've got choices like Python.
Ah! Thanks. I'm not sure why my brain wasn't making the connection.
Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
> Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
They are great if the frequency of changes isn't very high and the underlying tables are read-heavy and running it directly against the tables would be expensive (e.g. complex joins).
> A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
Typically Redis is used in medium to high volume of many smallish but repetitive queries where you want to store some key bits of information (not entire records at which point going to database might be simpler) that are sought often and you don't want to hit the database again and again for them - a bit different from the materialized view scenario.
False. I've developed enormous systems fully using stored procs for data layer and it's absolutely not the "best way to do the data layer". Next you'll be saying triggers are a good idea.
Yes they can be fast, and they're controlled by the DBA, that's why you'd choose them. Injection is not an issue in modern data access layers, that's a 1999 problem.
If your default stance in 2023 is that you should start with stored procedures, you are doing something very wrong and you are a risk to the survival of your business.
For CRUD stuff if you don’t care about execution plan cache and other optimizations it could save you some time sure to query directly. For chunks of code that transactions/procesing large data directly on the server I’d reach out for stored procedures without thinking too much
it's not like stored procedures are inherently
faster than normal queries, right?
They're doing the same amount of "work" with regards to finding/creating/updating/deleting rows.
But you (potentially) avoid shuffling all of that data back and forth between your DB server and your app.
This can be an orders-of-magnitude benefit if we are describing a multistep process that would involve lots of round trips, and/or involves a lot of rows that would have to be piped over the network from the DB server to the app.
Suppose that when I create an order, I want to perform some other actions. I want to update inventory levels, calculate the user's new rewards balance, blah blah blah. I could do all of that in a single stored procedure without bouncing all of that data back and forth between DB and client in a multistep process. That could matter a lot in terms of scalability, because now maybe I only have to hold that transaction lock for 20ms instead of 200ms while I make all of those round trips.
There are a lot of obvious downsides to using stored procedures, but they can be very effective as well. I would not use them as a default choice for most things but they can be a valuable optimization.
For example you can look up millions of rows then manipulate some data, aggregate some other data and in the end return a result set without shuffling back and forth client/server.
you can do that equally well in a stored procedure and a single query
like, you can write a query which does all of these transforms in sequence, and returns the final result set
the data that goes between client and server is only that final result set, it's not like the client receives each intermediate step's results and sends them back again?
If you’re going to mix multiple queries with procedural logic — eg running query A vs B depending on whatever conditions based on query C, then a stored proc saves you the round trips versus doing it in your app code. That’s all he’s saying.
It seems you don't have a lot of experience or understanding regarding stored procedures.
Obviously if you just take a single query and turn it into a stored procedure then yes, the round-trip cost is the same. This seems to be where your knowledge ends. Perhaps we can expand that a bit.
Let's look at a more involved example with procedural logic. This would be many, many round trips.
I'm not exactly endorsing that example. Personally, I would almost never choose to put so much of my application logic into a stored procedure, at least not as a first choice. This is just an example of what's possible and not something I am endorsing as a general purpose best practice.
With that caveat in mind, what's shown there is going to be pretty performant compared to a bunch of round trips. Especially if you consider something like that might need to be wrapped in a transaction that is going to block other operations.
Also, while you may be balking at that primitive T-SQL, remember that you can write stored procedures in modern languages like Python.
good lord man, the condescension is so thick and rich, it's like i'm reading an eclair, and the eclair is insulting me based on its own misunderstanding of the topic of conversation
That'll happen when you're publicly and confidently wrong. If you scroll up, you'll find any number of posts where folks pointed out your misconceptions in a more kindly fashion. When we factor in the fact that you're wrong (as opposed to my post which is correct, informative, and cites examples) I think many would say your incorrect assertions are a lot ruder and less HN-worthy.
Sometimes folks know more about a given thing than you do. That is okay. The goal is to learn. I am sure you know more than I do about zillions of things. In fact, that is why I come here. People here know things.
haha, man, the absolute _chutzpah_ you need to make (incorrect) accusations like this, even as an anonymous person on the internet, is really breathtaking
i hope you reflect on this interaction at some point
You seem to be firm in your objectively wrong belief that:
a stored proc is just a query saved on the db server, nothing more
Absolutely not. They can contain procedural logic as well. You can do a wide range of things in a stored proc that are far beyond what can be done with a query. Again.... I provided some links with examples. You don't need to believe me.
i hope you reflect on this interaction at some point
If you're just writing a single access app for a DB I agree it doesn't really matter much. Where SPs really help is when you're maintaining a DB that multiple projects from multiple teams access and you can present them a single high-level API to use, so that Accounting and Fulfillment aren't reinventing the same wheels over and over. So it's more about stack management than anything else.
My first thought is to scream in horror as Bobby Tables says hi to "string interpolation". complaining about evil ORMs and then just asking to get taken to the cleaners...
I really feel like a good chunk of the discussions around ORM/not ORM and such are because the core method of interacting with our databases sucks.
Passing strings around is an awful interface, and maybe if we explored the space to fix that, a good chunk of these subsequent problems would be obviated.
Man, you know what you can really just mostly ignore when you’re using an ORM?
Injection
If i’m building an app or an API, I will ORM till I die (as long as it’s Django). If I need anything so much as a groupby, though, i will drop right into SQL or create an aggregate table that my ORM can just do a basic SELECT from
If you’re just worried about injection, you just need to use bound parameters instead of string interpolation. Boom, avoided at the database driver level even with plain sql. I admit though that some cases of string interpolation can be harder to catch in your code than when using ORMs.
Humans make mistakes and we can't rely on them not to make mistakes in software development, yes. But preventing SQL injection is not really an "oopsie.". It's a fairly easy thing to spot and prevent, and none of the code I've ever reviewed had a SQL injection bug in it because string interpolation looks completely different from parametrized queries.
I have no problems with ORMs, but you make it sound like it's as easy for a SQL injection bug to go into production code as a memory bug in C. You'd have to actively be fucking up at that point, and if a Junior dev does it, it's a 5 minute talk and they'll never do it again.
I didn’t say you had to be magic about it in the way of git gut, no.
One can for example hook up a static analyzer to your GitHub and configure it to complain if there are any sql calls that can’t be automatically verified as free from interpolation of potentially unsafe data unless those dynamic calls are marked with a comment annotation as having been individually verified by a human as definitely not interpolating potentially unsafe data. I suspect such an analyzer may already exist as a SaaS offering.
If nobody lies about those annotations, a static analyzer can handle the common cases. If you have a human faking such annotations, whichever motivation causes them to do that would create other security issues even with the ORM.
And the analyzer could even handle some kinds of dynamic sql depending on how aware it is of your programming language and type system and what markings you’re willing to apply to functions and/or data with potentially unsafe and/or safe origins. Being vague only in the service of generality here since the specifics vary so much by company.
If there is no SaaS service for this, maybe one ought to exist.
Surely you mean a parameterized query string and not string interpolation, which is susceptible to sql injection.