I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications. Whereas I always suggest against it.
I'm a big advocate of understand your data model at the database level. Need to join on too many tables is too easy to do with an ORM.
My go-to strategy for SQL is simple. Abstract your SQL as far away from your application as possible. Not just by using interfaces in your application, but by using stored procedures in your database.
In my mind SQL is just another service your application depends on. And you want that interface to be as simple as possible for your application. If you need to tweak SQL performance it should not need any input from your application. I could completely migrate to a new table relationship structure without the application realising if that was what was needed. You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.
Yes, if you need to return new data then you need to update your stored proc and code. But that's so worth it in my opinion for that extra layer of abstraction.
My opinion is slightly skewed from a decently sized business perspective, but I do still follow this pattern in personal projects. When migrating applications to different tech stacks (like Java to Go, or C# to Go) this abstraction has meant the world to us.
I am against using stored procedures. In the projects I saw where SQL code was encapsulated by stored procedures the business logic was inconsistently split between the application and the database making maintenance much more complicated than it should have been. I am in favour of stored procedures to avoid several roundtrips to the database but only when performance really matters. The case on tweaking SQL performance does not convince me. Where I work we execute performance and load tests before every release of the application and we are expected to meet performance targets that are documented in the release notes.
I’m interested in this because I have the opposite experience. Not saying any choice is wrong, but in my experience, moving that logic into the database has made all our codebases easier to reason about. The API’s don’t have to botch error handling due to “open transaction need to be ROLLBACKed”, I don’t have to worry about bad devs doing stuff non-atomically and putting the db in a bad state, and the code is easier to read because it’s just a single query that does “what it’s supposed to do”. The API usually ends up just being a small adapter for DB functions. It also makes sense to me because the “database API” (e.g. the database schema with some methods in it - the tables themselves are usually not in the same schema) describes the valid operations of the data, and can make sure the data is consistent. It also makes adding new APIs easier.
>I don’t have to worry about bad devs doing stuff non-atomically and putting the db in a bad state
Premise: I assume we are speaking of relational databases.
The fact that a developer can "corrupt" the database with non-atomic stuff is an hint to me that the database probably doesn't have the right referential integrity constraints in place and probably is not normalized either. The relational model is built out of the box to keep a consistent state of the data and this is one of its main value propositions.
Another danger that I didn't mention is that junior developers will be tempted to prefer "Turing like code" inside stored procedures (for loops,cursors, conditionals) instead of relational constructs (joins, subqueries,grouping,table variables). I have seen that several times and this is really a killer for database performances.
The fact that a developer can "corrupt" the database with non-atomic stuff is an hint to me that the database probably doesn't have the right referential integrity constraints in place and probably is not normalized either. The relational model is built out of the box to keep a consistent state of the data and this is one of its main value propositions.
This is true when interpreted the right way, but I don't think real world problems are always so tidy.
Let's consider Standard Toy Example #17: The Bank Account Transfer. In a simplistic model, we might have a table of bank accounts in our database, and a second table of deposits. Constraints will guarantee that we can't do things like creating a deposit for an account that doesn't exist, but they won't ensure that if we create a deposit of X in one account, we must also create a corresponding deposit of -X for some other account so we aren't inventing or destroying money.
Of course, in a more realistic implementation, you'd never represent transfers between accounts as two separate stages like that, and if there is any single place to look up deposits into specific accounts it's probably some sort of view or cache rather than the authoritative data. But rather like going from single-entry to double-entry bookkeeping, to avoid relying on something higher up our system to ensure consistency, we've had to restructure our whole data model and the schema implementing it. In a sense, this is still just normalising the data, but that's a bit like saying implementing a word processor is, in a sense, just refactoring and extending a text editor.
A related principle is to make illegal states unrepresentable. If you can possibly insert a debit and fail to insert the corresponding credit, thereby making money magically appear, the schema is wrong. This isn't a matter of just insufficiently normalized, it simply doesn't represent what you claim it represents.
The database shouldn't be viewed as just some kind of scratch pad where you write stuff down so the application server doesn't have to remember it. If the database is the source of truth, then it is the model, and the application is simply a view.
This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database (or, if there was some ideal schema to enforce them, the migration from the existing schema would be enormously costly). This is the whole reason we have ACID transactions in relational databases - so the application can enforce its invariants!
(Exercise for the readers: construct a set of invariants that cannot be enforced within a database schema.)
This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database
As HN doesn't show moderations, I will just say that this is exactly the point I would have made if I'd been replying first. Relational databases are good at enforcing the kinds of relations provided by basic set manipulations. If you try really hard, you can encode some more complicated constraints, but as the complexity increases it becomes unwieldy and eventually impractical to enforce everything at schema level.
Edit: Changed "database level" to "schema level" to clarify that I'm not talking about stored procedures here.
I'm a little confused. A 'toy' bank app would have individual accounts with some amount of money in each. The problem arises when you subtract the debit from one account and error out before crediting the amount in other account, or vice versa. Now you're in an inconsistent state.
How would you typically design a relational schema that could avoid this scenario?
The usual for this would be to have a table of transactions instead of balances. The balances would be computed dynamically by summing the full transaction history for a particular account.
Of course this gets a bit unwieldy with large transaction histories, and has trouble with things like enforcing that you can't enter a transaction that draws a balance below zero.
Well yeah, but that just makes the business rules even more complex. That makes it more like - reject the transaction if it would move the account balance below zero, unless the account has a flag to allow overdrafts (which is over on the accounts table), in which case we allow the transaction and also enter another transaction for the overdraft fee.
Probably need some more rules like max overdraft amount, max number of transactions etc. I've heard some places may also have the "let's be jerks" rule to reorder transactions to hit overdraft as soon as possible to maximize fees.
That sort of problem is actually relatively easy to solve in itself. In essence, you don't record the transactions on each account separately, you record the transfers between them (just as double-entry bookkeeping does). Since any transfer inherently balances, you can't be inconsistent in crediting money somewhere without debiting it somewhere else or vice versa. If you want to know what happened to an individual account, you can still construct that information by looking at the relevant sides of all the transfers into or out of that account, but your single source of truth is always the transfer records.
Of course, it's still not that simple in any realistic system, because maybe you have other important constraints such as ensuring that accounts don't go into negative balances. In that case, before you can add any new transfer, you need to calculate the existing balance on the account you're taking money from, which means replaying the entire transaction log selecting for that account if you just have the naive schema described above, and then you have to create the new transfer record if and only if the constraint is satisfied. All of this has to be done atomically to make sure you can't have multiple transfers being recorded concurrently that would individually be valid but collectively leave an account overdrawn.
That takes you into the technicalities of how to process DB transactions with acceptable performance. Hopefully your database will help you if it's ACID compliant, but it still can't work miracles. For the simple two-sided transfers in this example, you can probably do a lot just using a database's built-in transaction support, but for more complicated constraints applied to more complicated schemas, at some point you have to take the practicalities of your database engine into account when designing that schema and deciding what constraints to enforce at this level.
I think as long you keep use stored procedures as atomic operations/transactions it both simplifies code and improves performance. In your example, a transfer money procedure should handle both the balance increment and decrement of all accounts involved in a transaction before returning. Things start to get hard to maintain if you have stored procedures that only complete a portion of the business logic and rely on code in the business logic to “understand” what still needs to be done
I think what he has experiences is something I have also seen a lot of times (and am still seeing in my current company):
Some developers put their application into SQL. That means, that the whole business logic of the app is SQL, and the app is just executing SQLs in order and does some front-ending.
But I don't suspect that is what the original author meant.
There are security matters too though, which no-one ever seems to get right.
Stored Procedures correctly managed are a more secure option than an ORM executing arbitrary SQL code, but the reason why is misunderstood consistently (I've had experienced DBAs get this wrong and insist to me that stored procedures prevent SQL injection attacks which is completely false).
To pick an arbitrary example out of thin air:
If my application requires 10 different calls to the database with variables, then I can create 10 stored procedures. I can set the permissions for the account that the calling application is using to only have EXEC privileges and only on those 10 procedures. This means that if the credentials were to leak the damage is limited to tasks the application could conceivably have performed anyway, albeit without any application-enforced validations on the variables passed to the calls.
If I use an ORM, I have to give the SQL account the client application is using more-or-less carte-blanche access to the database as it could conceivable read from or write to anywhere in the database through arbitrary SQL. Sure, maybe I scope the ORM to a schema and restrict on that, but it's not nearly as granular and fundamentally misses the point that I've opened a massive attack surface unnecessarily by creating a SQL account with generally free reign on the database that exists in the application layer.
On a recent project, mostly to see how practical it was, I built an application that used stored procedures with all of the validation being done directly inside the procedures in SQL. This had quite a few benefits:
I only had to maintain validation in one place.
I could rely on the formatting constraints I already needed in the database anyway.
The validation rules were beside the data and were much faster to edit and maintain over time.
The validation messages had to be pulled as keys making globalisation of the application much simpler later within the client application.
Different strokes for different folks, but there are literally decades of reasoning behind why RDBMS systems are the way they are that is completely bath-watered by ORMs.
"If I use an ORM, I have to give the SQL account the client application is using more-or-less carte-blanche access to the database as it could conceivable read from or write to anywhere in the database through arbitrary SQL"
You can give table level permissions (and even specific columns if you want) to the ORM db account. From a security point of view there should be no difference.
I agree with both of you, although have never set up a DB in either style :)
One small point: the proposed stored procedure approach seems, qualitatively to me, to be less error-prone. The consumer in the sproc approach either does or doesn’t have access to specific sprocs, while managing fine-grained per-column permissions seems easy to screw up in either a too-liberal or too-restrictive way.
The project is end-of-life now so most of this is moot but to answer these queries:
1) SQL calls in a separate command-line application.
2) Git. It’s in a .NET Database project.
3) See 2.
4) Not overly to be honest - the organisation uses SQL procs pretty extensively as is, so this isn’t anything particularly new or out of the ordinary other than the validations are held in a different part of the stack.
Not having consistent rules about what code belongs where is a people problem, not a technology problem. In order to write maintainable software, you always need proper code classification, regardless if you use stored procedures or not.
Again, this has nothing to do with technology. Why do we assume that you only have to follow software engineering principles when writing Java, but not when writing SQL?
I’m not sure what you’re arguing. You should have rules for what code goes where and stored procedures are also considered an anti-pattern by most developers because they tend to split business logic between the application code & database.
Let’s say you decide to use stored procedures with rules about what belongs in app code and what can go in a stored procedure, the “discipline doesn’t scale” perspective would say that as your company and app code gets larger your rules get harder to enforce. So devs tend to fall back to the simplest, easiest to follow version of the rule: stored procedures are an anti-pattern.
So what would be a solution then? I imagine, creating an RDBMS where SQL can be executed only by DBA root accounts, and normal (app) accounts can only execute stored procedures? That would enforce a pattern without requiring discipline.
Impossible to generalize, would depend entirely on what specifically you want the stored procedures to do. Though I can’t think of a good reason to abstract your queries at the database level. Even ORMs typically allow execution of raw SQL, so you could still store a straight SQL query in your app code (perhaps in a constant) instead of using a stored procedure.
I've been thinking about this a bit before seeing this discussion, and I'm starting to feel that abstracting at database level makes sense, because it turns your RDBMS from being just a dumb SQL execution engine into an application-specific API.
If you consider restricting operations on your data close to the storage layer, you can imagine wrapping DB access with an app exposing business-specific API, and route all your actual code through that API. Doesn't seem like an unreasonable design to me - particularly, if the same database is used by multiple applications. But if you do that, it may make sense to just put the API inside the RDBMS - giving you one networked software component less to manage.
(I've done work on a project using lots of stored procedures only once, and my tasks were unrelated - so I have no practical experience here. But through this discussion, I think I'm starting to understand why enterprise projects are anecdotally so in love with stored procedures.)
If you abstract at the db level then your db reading code is either unlikely to be in version control or you need a klug to for version control for stored procedures.
I suppose it is in theory but how many of us work in theory rather than with a set of tools that are commonly available?
Or maybe it’s just better to roll your own version control system to support your really amazing new app architecture that stores lots of logic in database functions, what do I know? ¯\_(ツ)_/¯
Same here.
Earlier this year I consulted a company for evaluating whether to modernize their current ERP or buy a new one. What I found was a large clusterfun of an application inside a MSSQL database with Microsoft Access frontends and some glue code molded as .net application on Windows 10 in between accessing third party APIs.
Barely anything documented, thousands of stored procedures. When I asked how they roll out changes and perform tests I got this answer: “As few deployments as possible. Everytime we deployed new features, there have been always issues for a couple of days.”
That's an application-centric view of development, as indicated by the phrase "the application". If an application is successful, its data will likely end up being used by other new applications, and the data will survive the lifetime of the original application that generated the data. The applications themselves will be rewritten many times as technologies evolve. Think mainframe to client/server to EJB to lightweight J2EE to Spring Boot microservices.
If there is logic or metadata that is common across all those applications, there are a couple choices. You could duplicate that logic across all the applications, and rewrite that logic every time you do a rewrite of the application, or you could keep it one place.
If you keep it one place, one way to do that is to have a service in front of the database that every application that uses the data calls instead of hitting the database directly. That has some disadvantages in that it requires upfront design and planning, the service in front of the database will likely be rewritten in new technologies over time, and it has performance implications. And for the developer, instead of having to deal with logic in an application and a database, now they have to deal with logic in two applications.
Another way to do it is to have common logic for the data implemented in key constraints, check constraints, triggers, stored procedures and other similar tools in the database so that any application that uses the data doesn't need to rewrite that logic, and can't intentionally or unintentionally violate the rules of the common domain. That does have its own disadvantages, and it makes things more complicated for developers who will need to be familiar with an additional set of technologies, but it is a valid use case for stored procedures.
I was once dead set against stored procs (having been previously in favour of stored procs) for the same don’t-split-business-logic reasons. However I now have a more nuanced approach - stored procs shouldn’t have business logic, but can be used to insulate sensible code data models from whatever random bozo (like me last year, or some legacy decision from 15 years ago) decided how the database schema should be.
The other way I’ve seen data-transport stored procs used is to allow multiple code versions to work on a single database - where you need to support version N and N+1 one the same data (I once worked for a saas company that provided a Preview version on live data before going to prod). Before version N+1 code goes live, deploy N+1 database schema, updated procs, new data, etc, and put in place versioned stored procs which let a version N application run on a version N+1 database, setting default values for new fields, etc. It did require some extra thought to make it work smoothly, but very rarely caused problems for customers.
Having worked as a DBA in numerous large financial institutions, and also as a SWE in a lot of node/python/MEAN-style startups. I can assure you that there are some very significant trade offs made with stored procs (or any business logic that is compiled to your database). It takes your one codebase and turns it into at least two seperate codebases. If you have a monolith, now you’ve got a distributed system. Any stored proc codebase is guranteed over time to become a maze of complexity and footguns. The more db features you use in this area, the worse it gets. If you’re using stored procs, then somebody is eventually going to create a table trigger. If a table trigger exists, then somebody will eventually write to that table not knowing it’s there.
The complexity it adds to release management alone makes it not worthwhile in a lot of situations.
It’s like doing full blown OOP design. Your codebase has to pass a certain (rather large) size before the complexity trade offs can even begin to make sense, and even then they often don’t.
>> Any stored proc codebase is guranteed over time to become a maze of complexity and footguns.
I think this is a very valuable and important point and wanted to highlight it.
>> Any stored proc codebase is guranteed over time to become a maze of complexity and footguns.
There's no absolute law that states that SP using environments will always end up as a mess, but there should be such a law, because they do!
I've worked across dozens to hundreds of enterprise environments, and the pattern is clear.
I suppose it comes from personality and culture with the people working on db coding more often in the 'get it done' end of the spectrum vs programmers where you find a significant number interested in 'get it right'.
An even stronger reason is probably that db coders are more likely to be autonomous instead of in large centrally driven teams. That central hierarchical organisation is a necessary but not sufficient condition for working to tame the entropy of large complex systems, without which they evolve into "a maze of complexity and footguns".
It's certainly a valid observation that anything that large enterprise does, will end up being more complex than it necessarily needs to be.
But if I look at a lot of the SP anti-patterns I've seen over the years, some common themes start to arise. I'd say the biggest issue is that it is more complex to maintain interfaces between two discrete systems than it is to write monolithic code paths. Seeing SP_ProcedureName accompanied by SP_ProcedureName_New is very common. It's easy to get into that situation, and tedious to get out of it.
Storing business logic in the DB also presents a lot of very tempting, but ultimately costly, shortcuts to developers. Using a non-temporary table when you should have used a temporary table, because you wanted debugging to be easier, but now you've got a concurrency issue, but it'll probably be OK, so you leave it there for 15 years. I have also never seen what I would consider to be a sensible use of a trigger. You also end up with SPs invoking SPs invoking SPs... and figuring out the side effects of calling something becomes a lot more complicated than it would be in a monolith.
I don't think they're always bad, but I do think that reasonable use cases for them are rather uncommon. When you had a lot of services directly consuming one database, SP interfaces used to make a lot more sense. But that's not a very common architecture is not very common any more. Even in large enterprises, you'll typically see new projects being developed as web applications, even if they're only served over the corporate network.
yeah, I don't think this is a problem with sprocs, per se.
I think this is a problem because the database code is treated differently from the application code, because so few coders understand SQL well enough to be comfortable in it.
So, as you say, the DBA's tend to be autonomous, and held to different standards. And with less ability to say "we have tech debt, we need to refactor our sprocs" because of the knock-on effect on the application code.
It's a management problem rather than a technical certainty, I think.
If it happens consistently across "dozens to hundreds" of enterprise environments as stated by the GP and with apparently zero exceptions in which they remained simple then, while maybe not a technical certainty, it should be treated as a practical certainty.
Pointing out that it's probably caused by management is not really very useful since there are so few means available for line workers to fix their managers.
This is true, and I agree, but mis-attributing the problem is still not good.
After all, if (for instance) a new IDE plugin appeared that managed sprocs consistently with code and solved this problem, then we'd still face "sprocs are bad" criticism because the problem was mis-attributed.
> If you have a monolith, now you’ve got a distributed system.
You already did, you just chose to ignore that the DB is actually a separate service. Ignoring this has meant that I've spent a not-insignificant amount of time in my last few jobs cleaning up bad ORM code because the queries performed terribly. I've cleaned such messes as pulling in all rows and then looping/filtering on them client-side, just generally querying for too many things/too many columns, querying for data and then looping over it to do more queries (instead of figuring out how to do it in a join), etc. When people treat the database as if its part of your monolith, because all the query logic is locally part of your codebase in your ORM, there's a temptation to be lazy and treat it that way. And it works fine in development because the database is small with only test data, but can be painfully slow in production.
I'm sure plenty of people do it properly, even with an ORM, but I've had to clean up the mess too many times now, across multiple companies and dev teams.
I'm not advocating for putting all of this logic into stored procedures, I think that's going too far in the other direction, only that you already have a distributes system, if you have a database, where the boundaries do matter.
Not to nearly the same degree. Start transaction > mutate data > end transaction is not nearly as distributed as invoke SP > let the database do something (?...) > wait until it's done. The first is a distributed system in the same way that an application writing to a filesystem is a distributed system.
I'd also point out that ORM is not the only alternative to stored procedures. In addition to that, there's nothing unique about ORMs that make them susceptible to bad code, or poor SQL understanding. Somebody who doesn't understand data structures and how the CBO works is going to write equally poor stored procedures. Removing the ORM is not the solution to your engineer not know how RDBMSes work. ORMs also don't suit every use case, but again, having a bias against ORMs is not the solution to failing to understand how your use case fits in with your RDBMS.
> I'd also point out that ORM is not the only alternative to stored procedures.
Absolutely. I personally use HugSQL[1], so no ORM, but I also don't use stored procedures. I guess my main point is that the boundary between application and database matters and not putting logic in the database doesn't make that boundary go away (putting logic in the database also doesn't make it go away). But I take your point, its not to the same degree.
> there's nothing unique about ORMs that make them susceptible to bad code, or poor SQL understanding.
I think that there is: it makes database code look like client-side application code, so there's some temptation to just blur the boundaries and mix them together. This is how you end up with code that queries for data, then does looping and filtering outside of the query, then queries some more. Sure, you can write equally bad code in stored procedures, but the boundaries are more obvious. At the very least, its likely a different programming language from the rest of your application.
> Removing the ORM is not the solution to your engineer not know how RDBMSes work.
Sure, however, in practice, at least in my experience across multiple teams and companies, education hasn't been a reliable solution. People still abuse the ORM and treat it as if the database is magic. Removing the ORM doesn't automatically fix it, of course, but it does force people to consider the database as a separate service whose access crosses a boundary. In an ideal world, everyone already keeps this in mind and ORMs can be used as they were intended, but, at least in my experience, this just hasn't been reality. Have I just been unlucky?
If different people and teams keep making the same mistakes with the tools, maybe its time to re-evaluate the tools and what they encourage people to do. Its not that the tools are bad in and of themselves, but that they encourage bad usage. (EDIT: I just saw another article on the front page and the title seems apt: "Discipline Doesn’t Scale")
Your typical micro-service architecture will run into a lot of the same types of problems. The cost of maintaining those inter-service dependencies easily gets quite high, so if you're going to commit to that, you really should first understand whether you're going to be deriving any benefit from it.
I have been working with Java for the past 20 years.
I have noticed that over the years the knowledge of SQL eroded a lot in population of Java developers. Not only that, but the design of application seems to be lacking. Where problems could easily be solved with a little bit of efficient SQL people mindlessly accept huge performance losses due to ORM as if they just did not see other possibility.
I've noticed this about few things. Regular expressions are another - they have this bad reputation as "write-only, magic one liner hacks" that should absolutely be avoided. The proposed alternative is usually one or more dozens of lines of manual string operations in a loop. It feels like it doesn't enter the minds of people in some companies I worked in that regular expressions are a separate language, one that you should learn.
And layers of abstraction make it worse. Devs don't know SQL because Hibernate, now new devs rely on the "auto-magic" of Spring Data JPA and don't now Hibernate...
We do this in work and it's not great in my opinion.
There can be issues with package versions compiled to the db. It makes continuously integration harder. We do automated builds and testing in the application layer but we don't have anything in the db layer. It's harder to create a new environment (for a developer who does not have full db permissions). Business logic ends up in the stored procedure layer AND the application layer.
Including a real DB in continuous integration is a must for any development IMO. (And these days usually very easy to throw up a database using Docker -- each of our full CI runs runs in 2 minutes and does hundreds of "create database" calls and populate real SQL databases from scratch..)
For instance:
1) You want to implement idempotency in your backend by using uniqueness constraints in your DB
2) That's important to test
3) That's not really tested unless you include a real DB
After including a real DB in our CI setup I will never move back.
> for a developer who does not have full db permissions
Are you using a commercial database (SQL Server, Oracle, or similar?). With open source databases this is non issue as developers can run a complete instance locally with full permissions, completely isolated from prod/staging servers.
> but by using stored procedures in your database.
This used to be the standard when clients used to connect directly to the database. Now that the world depends on web services, things are a little different and there's less incentive to maintain stored procs as an interface to your database.
It's still a good idea though, with unparalleled performance. I suppose developers don't do it for a variety of reasons: They may lack the SQL knowledge, they may not want to maintain the extra code, they may fantasize about database portability, etc.
Well as someone with 25 years SQL experience, I generally will still use LINQ for 90% of db access code in business style applications, for very important reasons beyond those mentioned, like
- They may realise value in having a compiler provide guarantees, as a bonus immediately in a tight feedback look while editing code as well as on compilation
- They have favour expressive languages with robust error handling, integrated IDE and VC etc
- They may want to reduce the number of moving parts
These are all along the themes of a) delivering results faster, and b) improving the maintenance lifecycle of systems - unlike those parts of the industry that don't really plan to maintain their systems and instead just wanna do rewrites in the latest hotness which is usually a hot mess from a long term maintainability perspective.
Generally, use of LINQ in the context of DB access is done hand in hand with an ORM, such as Entity Framework, NHibernate etc, and this is how I meant it.
I'm not a fan of stored procedures because 1) It's more maintainable to keep all the logic unified in the one language and system with which developers will be more familiar with 2) The programming tools are normally much better for app code 3) You can write unit tests for app code (does anybody write unit tests for sql?) 4) you can usually avoid transferring too much unnecessary data or doing too many unnecessary network requests, remember to avoid premature performance optimization
On the other hand, one of the highly specialized stored procedure I wrote in PostgreSQL have being successfully and flawlessly re-used/tested on multiple out-of shelf clients app (BI related all developed in different languages) simply because they all support SQL.
I agree this only apply for data-model related functions. But this has proven really low maintenance, and improvements of this "API" are automatically available in all clients.
I'd bet that accessing this stored procedure through an ORM raw SQL query would provide more benefit than rewriting it from scratch in ORM's language.
PS: Also yeah people do write unit test for SQL (never enough but that's another story).
https://pgtap.org/
1) One language and one system: SQL on a database that will last much more than the applications
2) SQL has a limited scope, and so do SQL tools. You need advanced tools if you have "app code": it's a necessary evil.
3) At worst, you can write unit tests for app code that calls SQL queries and stored procedures.
4) "Usually" isn't enough, moving logic to the client requires moving data to the client.
The argument against it I usually see is "you can't version control SQL!"
Where I work we have a custom system which manages releasing immutable SQL snapshots between environments and they get merged to master once in prod. The only thing needed is a process, and then version control is easy.
Editing stored procedures and deploying them on a database is the same as editing code deploying it on a server. There is no difficulty in versioning it
Yeah you can just use a migration system and store your functions in text files checked into the version control. Nothing but the migrations are allowed in staging and production so everyone is forced to use it.
As a front-end dev, I've been interested in learning about such a process but don't know where to look. What kind of tooling would you use to manage your migrations in a CI environment?
I'm not sure about other DBs, but SQL Server has (really good) tools for this (DACPACs). As an added bonus you can then use tSQLt to unit test the SQL.
On high volume applications I would avoid it since it's usually easier to horizontally scale web servers than sql servers, and it makes cache strategies more difficult depending on how and what you're querying.
It is a common misconception that more data processing in SQL puts a higher load on the database. A typical database spends 96% of its CPU time on logging, locking, latching and marshalling [1][2] rather than processing SQL. By sending less data to the middle tier and performing fewer round trips, the use of stored procedures means that the database can actually do more real work.
It depends. Regardless there is limited CPU, and so any scenario in which a stored procedure uses more CPU than a simple query will cause you to hit that saturation point sooner.
I generally have layers of caching on top of the sql server so the majority of queries will be integer equivalency or range checks, if not get by primary key queries. So I am not generally operating in a scenario where a stored procedure would reduce record scans, etc.
I generally also don't run on transactions out side of limited scenarios, since throughput is usually more important to me than data consistency.
Imagine for example that you have per record caches. You can run a query to get ids with out joining to the table and then simply fill in any gaps in your cache with a follow up query.
In most cases, it is sufficient if your database runs on the DB in production and sqlite for local development. If it's too big to be able to run on the dev computer, then it should use one db across the board.
Portability across databases rarely buys you anything but hard to reproduce bugs.
Note that there is some work for v2 in allowing users to create JOIN queries but without JOIN support it's effectively unusable in anything other than very small databases.
wait, what? doesn't support joins? Things like this make me doubt my fundamentals. 'Are joins really necessary?' is the question now running through my mind ;-)
It does not support joins. The expectation is that you "Preload" your data which is a fancy way of saying load it all into memory. The performance of this is trash so don't use that ORM there are a few better ORMs for GO mentioned in the issue.
Yes joins are really necessary in SQL they are a foundational part of using any relational database.
Yes, they're literally the second thing one learns about SQL, really. I'm not being too serious about rethinking whether they're necessary, to be clear!
What a strange design decision though. I don't really understand why it is doing this? Why would anyone really think it's a good idea?
Thankfully my team is making the switch from gorm to vanilla SQL. I joined the company a little over one year ago on another team that also uses it, and using gorm turned out to be a really pain. It was the first time I had to use an ORM, and I really disliked the experience & constant source of issues.
You're sadly fighting against the "but... but... it's how people DO it" mentality here.
Not to mention, as computing expands, the average capability of people is dropping. Seats need to be filled, and those seats don't all need to be high quality devs.
So ORMs are there to reduce the workload, in terms of what needs to be learned to be productive. Database tuning is becoming a lost art.
(For DEVs not getting this, on a largish DB, I can literally write a single query which can take 10 seconds in MySQL's console, or hours. Yes, hours.
This isn't a flaw, any more than doing a tree search wrong is a flaw in <insert language here> when it is slow.
And most ORMs don't come close to properly handling all edge cases here.)
Beyond all of that, there is another bonus to using stored procedures or functions. Security.
There are many tables I want to give read only access to, yet want to allow writing in very limited scenarios. A stored procedure can have different user permissions (at least under most DBMS), which means you can:
- make a table read only for a web application user
- then, allow a stored procedure to perform INSERT/UPDATES, via a different user
- however, let the web user execute the stored procedure
The stored procedure thereby limiting the INSERT/UPDATE to a very specific action.
Mostly, I've used this when dealing with tables/databases which have a financial component. Said stored procedures can even perform additional checks on other tables, whatever is required to ensure that the update == OK.
Beyond that, it also allows for tiered security, in case you have someone in house, maybe a new dev, who decides they'd like to modify that financial table a bit.
And beyond that, it also allows logging to be done at the DB level.
While you are more limited here, in that you really do not want to be burdening the DB with more writes, you can definitely log all modifications to said financial table.
(You could use triggers for this last bit too, of course, but still...)
> Not to mention, as computing expands, the average capability of people is dropping. Seats need to be filled, and those seats don't all need to be high quality devs.
Also, a lot of businesses have analysts/data scientists that know SQL and have access to the DBs: stores procedures make business domain-level functionality more available to them because they don’t have to guess what application code is actually doing.
> ORMs are there to reduce the workload, in terms of what needs to be learned to be productive
The problem is (which you seem to recognize but most seem to miss) is that the opposite happens: the ORM lets you shoot yourself in the foot quickly, but that's about it. In the end, you lose more time using an ORM than you actually gain.
> I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications.
And there is nothing wrong with that. It's like recommending against the use of HTTP before people learn TCI/IP, or recommending against PHP/Apache just because people aren't really writing servers.
Learning how to write a web app and learning database management are 2 separate topics. Furthermore, GORM is rather barebone, compared to Entity Framework or Hibernate, due to Go type limitations...
The thing is, RDBMS are set oriented. Manipulating sets of tuples is a very high level kind of thinking compared to manipulating individual objects and their relationships in a procedural manner.
Of course sometimes the task is to write load/edit/store logic for individual records and their relationships. There's nothing wrong with using an ORM for that.
But extending this sort of thinking to all data processing is extremely detrimental to both productivity and performance. If you're using an RDBMS you're well advised to apply a sets first approach to data handling.
That's the danger of ORMs. They encourage people to do the wrong thing.
They also add considerable complexity to a project. So after having written my own ORMs many years ago and using some of the popular ones I would stay away. The cases where they are the right tool for the job are just not worth the friction they create.
Not only teaching people to do the wrong thing, but also teaching that RDMS's are scary, deep-level, things that developers don't need to understand in order to do their job proplerly. Which isn't true.
There's a big difference between HTTP, Apache, and ORMs. People not understanding why they're different is what sinks companies all the time.
Using any abstraction is a gamble that it doesn't leak before you have the resources to fix it.
HTTP is a terrible comparison because in almost every domain you will either make millions or (far more likely) fail for other reasons before you need to understand how TCP/IP works.
I bet there's loads of companies that have died a slow death, releasing everything a month later than they want to, because the back-end developers aren't comfortable with whatever the stack of the day is and they were too busy learning yesterday's stack of the day instead of the intricacies of the language they're working with, or the standard library.
> It's like recommending against the use of HTTP before people learn TCI/IP...
This is a false dichotoy. SQL is not TCP/IP because it's nearly on the same level as ORMs. Unless you normalize and understand your entities properly it'll be difficult to write performant systems. ORMs are rightly called the Vietnam of Computer science¹ for this reason.
This is the approach I like as well, with the caveat that the stores procedures themselves should be kept as dumb and as late in the process as possible.
I have made many mistakes in my career in terms of systems that were either overly complicated or inefficient or both, and it has almost always ended up boiling down to either trusting the SQL that an ORM produces, or putting too much of the business logic in hand crafted stored procedures. I have done both, regretted both, and nowadays advocate for sticking to simple, predictable, and dumb stored procedures so that you can both have the benefit of knowing/controlling exactly what will execute in the database and having things like validation occur code side whenever possible.
In the real world, that simplification of stored procedures isn't always possible. But if it's your starting assumption, then you can fight for it until you are certain that there isn't a practical alternative to pushing the logic into the stored procedure. And if a compromise of these principles needs to happen, I would push complexity into a stored procedure before trusting the SQL produced by an ORM every time.
But where I fall on that spectrum may just be due to the severity and types of problems I have seen happen with each approach, do YMMV. And I wouldn't go so far as to call myself an expert, so also YMMV on that front. I am just a guy who has made a lot of mistakes. But if it'll help others, this is my takeaway.
Came here to say exactly this. Also a gopher, who is also vaguely horrified at all the "use GORM, and a framework" advice to newbie gophers.
I also use stored procedures, and treat them as function calls from my application code. So I effectively build a "data storage API" in the database, that I then call (as you say, like any other service) from my code.
I also have views for all the data sets that will be returned to the application, so I can use them easily as a type for function returns, and enforce consistency.
I'm trying Gorm right now, and I'm running into some serious issues where it's not as straightforward as an ORM should be (mostly having to do with transparently saving / updating relationships).
I'm reluctant to go to straight SQL because then I have to:
- Write a ton of 'mapping' code between resultsets and data model
- Do a lot of diffing to determine which relations to add, update or remove
- Manage my own database migrations - challenging because I'm stuck with using sqlite
- etc
I want my tools to make things easier for me, but gorm seems to be the "now you have two problems" direction - for now. Maybe I just don't get it yet. Maybe I just don't get databases anymore either.
Depending on which DB you're using, I'd suggest trying out https://sqlc.dev instead of gorm if on PG. Mysql support is a bit flaky last time I checked.
> You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.
How do you unit test SQL code? You could integration test against something like sqlite but unit testing stored procs sounds very difficult.
I worked with a big C# .net core ecommerce platform that had a big stored proc used for ingest. It was an absolute pain in the ass to debug and fix issues because testing it was so difficult. Eventually we replaced it with C# code. Access to full software dev tools (debugger, automated tests) is really something you don't want to be without on large projects...
Things like search interfaces might offer filters over multiple columns... your choices here:
1. Use an ORM
2. Write SQL for each permutation of filter
3. Write a helper that will compose the SQL based on the filters needed
And that 3rd one is the only time I stray from plain SQL towards something that looks like a lightweight abstraction over the SQL.
Similarly... stored procs, I avoid these but there are times when they are necessary. If you need the boundary of a transaction to span multiple SQL calls then it is safer to do so within a stored proc. Yes you can use a transaction and rollback, but from a maintainance perspective I've seen refactors break transactions that were not fully encapsulated in a single Go (insert other language) function .
Yes keep things simple with SQL and do everything in the application... but also break those rules rather than do something daft or with side effects.
Your view is the original design idea behind stored procedures.
Stored procedures as far as I know were first implemented in Sybase SQL Server. We tended to think of them as enabling RPC (remote procedure call) access to the DBMS. Sybase was also the first major relational database to offer client access through the network. Our connectivity was influenced by distributed system concepts like RPCs that were top-of-mind in the 1980s.
I've recently came up with following approach: make it easy to generate CRUD applications without ORM's, and then modify them any way you like without extra dependencies.
I salute your level of abstraction but I personally draw my line in the sand a bit higher. A library project for the data access layer with a stable app facing code API gives me the best bang for my buck in terms of flexibility and abstraction.
That said, its easy for me to push out new code versions. I realize it doesn't work for everyone.
>> I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications
I've made the opposite observation. Compared to Python, C#, Java and NodeJS, ORMs are actually not a big thing in Go. Which is good - I agree with you.
Pageable also adds window functions (limit, offset). If you are going to compare, please be courteous and do it right.
When app starts up, Java compiles these declarations and will abort if the declared function actually does not match the entities. This is a huge safety net, which you would not have when you write SQL. And also when you write SQL you have to take care and unpack the "select *" into your fields etc etc. I suspect this is much more important downside than you might realize.
How do you get results of your query to SomeObject then?
I think it depends on the use case, as always. I use Hibernate for our platform, which is used at most by 10-20 people at once. There ORM is a huge help because I can just easily view and edit my model and all its relations and I accept performance hit in this case and if something is too slow, I rewrite it in sql.
Then we have API that uses data of the platform, and there I dropdown to plain handcrafted sql with a little help of JDBI and so far it has been working great and responses are fast.
So I think it is just about stepping back a bit to think and not to blindly accept either side of the argument. Each use case is different.
That's not Hibernate, that's Spring Data or some such, which is indeed cursed. This bit of Spring was absolutely designed by and for people with no critical thinking ability.
Meanwhile, JPA, the standard which Hibernate implements, has a textual query language [1] which is basically SQL but a little more objecty. For example, you can express joins through properties concisely:
SELECT c.capital.name FROM Country AS c WHERE c.name = :name
And use constructor expressions [2] to map results into Java objects:
SELECT NEW example.CountryAndCapital(c.name, c.capital.name)
FROM Country AS c
And nothing is going to stop you from doing this, you can pack you query (native or JPA-QL or HQL) into @Query annotation and be done. The good thing is that it will do mapping for you.
The advantage of using findEntitiesByIdAndCreatedAtAfterOrderByCreatedAtDesc is that you will get for free paging support (which is not trivial and has to be done properly for each database engine, simple approach with using limit typically is very inefficient).
Also, most of the queries will be something simple to fetch some entity using some business identifier or primary key, not being forced to write down all those simple queries is a productivity boost.
For more complicated queries you can fallback to writing query manually.
There is also maturing Spring Data JDBC project that kind of has "best of both worlds".
Debatable. Method syntax vs query syntax is available. I mostly see method syntax in my codebase. I also see terribly performing linq queries all over the place.
I think it's neither good nor bad, it's just a tool. Tools are useful for making things, but if wielded improperly they will hurt you.
I'm sure same arguments can be made against raw SQL. It's possible to have good codebases in any paradigm, and bad codebases in any paradigm. It's really all down to the quality of implementation.
As a side note if you want a creative way to backup a database just use try logging an EntityFramework tracked object in Serilog with the destructuring operator. Watch as the logging framework inadvertently causes an entire object graph traversing navigation properties endlessly make infinite SQL queries while your web server slowly runs out of ram. Fun times!
You can provide the query, hibernate/JPA does not stop you from doing that. Autogenerated queries for simple CRUD ops, but queries supplied by developer for more complex logic is ideal.
Everyone here is arguing for their favorite side. As often is the case, the best solution is often somewhere in the middle, using the best tools for the job.
I authored a complete rewrite of an ancient and rotting PHP+MySQL web ticket reservation app in Python+SQLAlchemy+PostgreSQL. I use an ORM - except where it doesn't make sense because a SQL query expresses what I need to do more concisely and effectively. I don't use stored procedures - except where I do because I need one specific atomic DB operation to be performant and not bottlenecked on the app. I use relational storage - except almost every table in the database has a big JSON column for everything I don't need to ever join, filter on, or index in production codepaths (though I can still do that with PG's native json support, which is great for the rare case I have to move something to a real column). And I use triggers, stored procedures, and notifies to implement live change notifications for a table, that eventually get fed via WebSockets server to users.
This hybrid approach has served me extremely well, resulting in very readable and maintainable code, minimal DB schema migration pain (most upgrades only touch JSON fields and thus require no migration), and much better performance than the old app, especially in that hot path using the SP, while keeping table column bloat down, and avoiding the join spam that results from keeping everything religiously normalized even in cases where that doesn't buy you anything.
Of course, that does mean you need to know all the relevant technologies involved, SQL, ORMs, etc.
YMMV, but consider that if you think a single solution is the right solution in all cases, you're most likely wrong.
YMMV, but consider that if you think a single solution is the right solution in all cases, you're most likely wrong.
This is sort of the meta-principle that I apply to all of software development: no principle or methodology (DRY, YAGNI, SOLID, TDD, etc...) applies 100% of the time (except for this principle. ;))
I suggest adding `details_schema JSONB` and validating `details` against it. Now you can still add fields to `details` without deploying code changes, but every dev in the company won't throw random garbage in there – or expose that field as raw JSON in an admin panel somewhere.
The whole point of having a JSON type is to do away with needing a fixed schema, no? If you need to constrain the format of the JSON object, why not just represent it with more DB-friendly types in the first place?
In an enterprise production environment, it's much easier to update a JSON schema, which is stored as data, than to alter the DB schema. The point of having a JSON field isn't to let everyone with write access to the API stick whatever they want in there; it's to let the administrators of the service change what it can store without engineers deploying code changes and DB migrations.
The thing I struggle with is knowing whether or not the solution I designed was biased towards SQL (because that's what I learned years ago) and therefore would be better suited for SQL. OR maybe if I learned more about ORMs I could craft a solution that isn't SQL biased. Or maybe SQL "just works" for most of what we need to do with applications right now, and that's where the SQL bias comes in?
I do not agree and this is coming from someone that loves SQL. I only experienced the problems that this article describes with people that do not know how the ORM works.
You can use an ORM and it works flawless for most cases. There might be some cases where I need to write SQL to generate a custom function or some weird edge case but that is why the ORM gives you the possibility to write your own SQL if you want.
I forgot to mention that having to write raw SQL queries and mantain them, doing migrations and keeping up with the changes is kind of a pain when most of the time the ORM takes care of everything.
ORMs are good for quickly building something. But if you're building something that is expected to last and that will be passed on to other developers... ORMs will likely derail or put severe limitations on your application.
Disagree completely. It depends, very much, on context.
There's contexts where good knowledge of the ORM, and of SQL, is part of the bar of working in the environment. This often happens in stacks where there aren't a zillion different choices but rather curated default choices for things like ORMS.
Combined with having a large and sophisticated system (or many similar small ones), then the complexity saving of being able to use a good ORM (and LINQ) provides a significant savings in time, and complexity.
This helps the systems to last, and be passed on to other developers.
Counterpoint: a lot of ruby devs are very familiar and capable with ActiveRecord. Al lot of python devs are familiar and capable with using the Django ORM. Why are these ORM more likely to limit your application? You provide no proof or explanation of your statement.
I second this. For example, Django's ORM is actually very advanced these days, allowing you to squeeze out the performance in many, many cases. While I agree that ORMs should not always be used (sometimes you just need raw SQL, especially on very large databases), I think the ORM hate is a bit overstated on HN.
The Django ORM is pretty good, and I like it. But, to squeeze out performance, wouldn't you say you have to actually understand what is going on in the database?
Agree! I like django's ORM and I love SQL databases. It is nice that your in-code model changes, will reflect the schema changes in the database. And migrations are a blessing. (Of course it has some foot guns.)
For optimizations:
my_queryset.explain(verbose=True, analyze=True) is very nice to understand what is going on. If you cannot get it done with django's database abstractions, there is always:
my_queryset.extra() or even my_model.objects.raw(). Where the last one maps the result to the model, for free.
To others responding that ORMs are bad:
IMHO saying that raw sql is better than ORMs is short sighted. Many simple things are nice in ORMs. Queries are often simple. Today, in django, complex queries are also nice. For edge cases raw sql can be better.
Extending queries is also super easy with a good ORM, where with raw sql you would have to maintain two almost the same queries.
Just remember to run and inspect queryset.explain().
Fully agree with this. These discussions are always this or the other, but you can easily have both in the same project. Django's ORM and raw SQL functions prove that.
And more importantly, you'd want to design your database to be optimal for complex SQL queries and not whatever generic structure that matches your programming objects.
The only limiting thing I've experienced with ActiveRecord is when the database becomes larger and more complex. The performance takes a bit of a hit. Which isn't ActiveRecords fault, it's the developer's lack of understanding how the database works.
Most of the issues I see is a lack of understanding of things like N+1 queries. I see this mistakes from junior to senior level. ActiveRecord has built in solutions to fix them too. Some developers don't know about them.
There are built-in solutions but also quite severe limitations when you get to advanced usage of activerecord. Especially with regards to composability. Of course, raw SQL is also really bad in that scenario.
Working on a Ruby application that uses a database that predates the application (it was originally written in another language), there are simply some things that can't be written in AR, or when they are, the performance just isn't there.
Additionally, we often have to create ad-hoc queries to answer questions from stakeholder. It's easy enough to just pop up DataGrip and write the query, and copy the tabular results into a doc in Google Sheets for sharing.
I think the best approach is ORM-first, not ORM-only.
An alternative viewpoint on that situation might be that the database is in need of some serious refactoring/migrating if the app can't deal with its structure. Trying to bend Rails into non-AR database usage is going to lead to tears sooner rather than later IMO. It's just so very integrated.
Having separate applications with (write) access to the same database is similarly going to end in trouble. Suddenly there are all sorts of problems when the database needs a migration and the applications are expecting different things from the same table. This is not really ORM-related btw.
That said, ActiveRecord is not perfect and sometimes a single handcoded SQL query can deliver a ton of value. Profile carefully.
> Trying to bend Rails into non-AR database usage is going to lead to tears sooner rather than later IMO. It's just so very integrated.
Is that really true? While I've barely used Rails in anger, and not at all recently, I was under the impression that all the major Ruby ORMs and high-level database libraries (e.g., DataMapper, Sequel) are heavily focussed on Rails usage and usually around AR pain points like working with DBs in Rails that don't fit ARs opinionated patterns.
> Having separate applications with (write) access to the same database is similarly going to end in trouble.
It shouldn't, if the DB is designed for that, which usually means each application sees the DB exclusively through its own set (probably in a distinct schema) of views, and keeps it's filthy little hands off base tables. That's a best practice for multiple application access to an RDBMS that is older than the Web.
I won't argue that an ideal situation is to stop all feature development and refactor into an ideal data model. That's a technical debt ROI discussion, the kind that we always have, and sometimes the business value just doesn't support the rewrite.
However, even in an ideal data model, I don't always agree that multiple applications accessing the application's database is bad. That's a matter of being thoughtful and communicative, instead of just wantonly running "rails g migration" every time you need to implement a feature (that's true of Monorails even). FWIW, we've never had any real issues with multiple applications accessing our single database.
The ORM actually might reduce the risk of derailing over time as it will probably handle any database layer upgrade, will perform checks to prevent common issues, while your own SQL might not (thinking of utf8/utf8mb4 on mysql8, newly reserved words escaping, special characters escaping, ...).
Our views on this tend to be influenced by the sorts of data applications we've written in the past. I can imagine that people who've done a lot of corporate/enterprise development with large teams or straightforward requirements would see immense value in an ORM. I'm sure that doesn't even begin to describe the breadth of projects where an ORM is a good idea. I've never built applications in this space so I can't speak to it.
With most of the stuff I do, an ORM is a non-starter for my databases. The data would make no sense as objects and I tend to avoid using data objects in programming anyway. It's probably impractical or impossible to build what I build using one and even if you could it would be more difficult to write and performance would certainly be atrocious.
There are many reasons why ORMs work better than it did earlier. For one, people prefer simpler database schema these days - and what used to be larger monolithic apps are often broken into micro services. Earlier, you'd have one big database - today you'll have many. And ORMs are great with simple queries and joins.
The language plays a big role in whether an ORM is actually useful or not. In the .Net world, ORMs work quite well because querying capabilities are integrated into the language. Queries and schema are verified at compile-time, and I wouldn't trade it for slightly better performance or control. On the other hand, with Python (or say Node), or with Java and Hibernate the wins are smaller.
Of course, there will be some queries which are just better written in plain SQL. If you're willing to accept that, ORMs are a good tool in your toolbox.
You can get some benefits you mention by using query builders (like jOOQ). They can get unwieldy at times (not part of the language), but you can drop directly to SQL if you need to, and they tend to be simpler than ORMs in various use cases.
jOOQ needs to be mentioned in this context. With jOOQ you basically write Java-SQL-DSL, and with some code generation from your schema you have really neat type safety (and code completion).
String SQL queries are not the answer to ORMs, imho, we need a level in between. jOOQ provides a really cool set of features in that space.
Counterpoint: By migrating to microservices, you're moving the problem, not solving it. Sure, your individual services and tables are simpler, but when you need to combine the data, update it, etc you run into big issues that aren't trivial to solve.
Ef, entity framework, exposes an iqueryable which you can use linq to query against. Meaning you can use your orms entity model and query your database In a type-safe compile-time-checked manner.
The last bit of the article reminds me of an experience I had recently working w/ another group of developers. They write in C# w/ Entity Framework, but they just use it to map stored procedures to C# data-types. All queries to their database go through a stored procedure w/o exception.
They push as much of the business logic as possible to the database. Their reasoning being that if the client insists on a separation between the DB & application servers: you should do as much computation as close to the data as possible. Then just send the end result over the wire.
Due to my own ORM-induced brain damage I found it hard to wrap my head around this at first: a data type in the application no longer represented a table, but the result of a query. Once you realize the database is just another API though it clicks really nicely into your architecture.
I think I still prefer things like Linq, jOOq, Arel, Ecto, etc. where you can write the query in your programming language and have it translated to SQL. It's just nice to see your query right next to the code when debugging. The author is absolutely right though you still have to know SQL to use tools like that effectively, so you might as well just learn it early instead of wasting effort learning the quirks of a specific ORM.
Most programmer has problem with SQL because it is declarative langue, and doesn't fit into the C, Java context very well. However, if you just spend some time getting use to declarative langue, you will find SQL is quite nice. You just ask for stuff you are looking for and the computer will give to you. And SQL servers like Postgres has gotten so optimised in returning search results, it really out performs mapReduce.
Yup that has been my experience as well - if you read someone's stored procedure it's pretty easy to tell if they're a "relational thinker" or an "imperative thinker." People who embrace an RDBMS will tend to use constructs like CTEs, windows/partitions, set operations, they'll do their conditionals in the projection (i.e w/ CASE statements), they might use a CROSS APPLY to do a transform over a "collection", etc.
Whereas people who are "imperative thinkers," who just treat the database like a giant excel spreadsheet, will use flow control for branches, cursors for iteration, tons of temporary variables, etc. Without fail the iterative thinkers wonder why they need monster CPUs for their DB server and it's always pegging one core. Well of course the query planner can't optimize your code: you told it how to get the data, rather than asking for what data you wanted.
This was a fairly common approach about 10-15 years back.
It has its charms but it mostly fell out of favor because of versioning. It's very hard to manage multiple versions and migrations for the database and it had a negative affect on development velocity.
IMO people discussing this topic really need to clarify what type of applications they are working on because the cost/benefit analysis changes quite significantly based on use case. If you're building a web-based reporting tool where you simply query records out of a database to dump to HTML and your objects are short lived, you might not get as much value out of an ORM compared to a client-side app where objects stick around for the entire lifetime of the application and you have to worry about things like object identity and staleness.
I'd also add that if it's a community-based open source project, there can be great benefit in more safely supporting a bigger set of database engines (e.g. sqlite for the simple entry-level, postgres et al for the more serious) at the cost of some performance overhead.
If it's a performance-critical in-house service where you have full control of the deployment and can tailor it to your use-case, it's a completely different story.
We have to support both MySQL and MSSQL so Hibernate has been mostly great for that. For really simple queries it's also nice to be able to write "findAllBySomethings_IdAndPropertyIsNull" etc instead of writing a SQL query. Been there, done that.
Though I have to admit I do check what queries Hibernate produces to make sure it's not doing some funky stuff that's not really needed (as a result of a mistake of my own, mostly)
Edit: It has come to my attention that it is ofcourse Spring Data that offers those named queries. Still see nothing wrong with hibernate since I still only have to write it once and I'll be relatively safe supporting both db systems.
I personally have found that how to approach data access has been a real bone of contention on projects and quite damaging. It seems half the team want to use an ORM and have a long list of reasons not to use SQL/Stored Procs (slower to develop, out dated, not testable, business logic in the wrong place). The other half of the team wants to avoid ORMs and would rather use SQL/Stored Procs (performance, ORMs start off okay but soon aren't up to the job, more control and power with direct SQL). In fact, you can see these two standpoints in this very discussion.
I find both have valid points and there isn't really a compromise. Whichever approach is taken you end up with half the team feeling not listened to and disenfranchised.
I have found few things to be more divisive than the ORM vs No ORM debate and I am not sure what the answer is.
I have that that kind of divisive discussion. But we did compromise. What we ended up doing was using an ORM for CRUD operations, which were mostly user-facing ('row' operations); and SQL for reports / data monitoring ('column', bulky operations). And doing our most to avoid procstocs (limit them to executing REFRESH MATERIALIZED VIEW and COPY operations to the best of our ability), and trigger processing.
We do manage the SQL code (view definitions, mviews, and the remaining procstocs) right next to the application code with Flyway [1]. We also have SQL-ORM integration tests, spinning a database container with testcontainers [2].
We've had some issues with business rule duplication (updating the ORM and forgetting to do the same in SQL), but so far I'd say it's successful.
This being said, I'm on the SQL boat; and I remain convinced that the CRUD could be done comfortably with an SQL query builder such as jOOQ [3]; and that it would help solving the business rule duplication issue. But hey it's working right now and everybody is happy about it, so why change it?
Agreed. As ever, it depends a lot on the type of work and how big the system is. We use a lot of SQL (not sprocs), but are moving back to LINQ for reasons of composition and maintainability. I love writing SQL, but in our system which is north of 300 tables, on a project over 10 years old now, the lack of compatibility and general lack of static dependencies in SQL is hurting us more and more
But if your answer to all those questions is yes, why are you using an RDMS to begin with? Wouldn't some other kind of data store map better to those use cases, and then you wouldn't need a bit abstraction to make it look like something it isn't.
ORMs completely fail for many complex domains. If your data represents an arbitrary tree hierarchy, a graph, or otherwise, relational databases have nice data architectures, but those completely misalign with how ORMs handle data.
You don't even need to go that complex. Even moderately complex JOINs start to look better in SQL than in ORMs.
If you have an employees database, an inventory database, and a clients database, an ORM is perfect. If you have a fixed hierarchy, ORM works great too. It maps objects to the database. That's 90% of web apps.
If you're building e.g. an online CAD system with a complex data model for storing hierarchical layers of objects with complex relations, and where you need to perform complex operations on that data for e.g. simulations and optimizations, SQL will actually handle that just fine. You'll just be doing SQL beyond what fits into an ORM. At that point, if you use an ORM, you'll be doing complex contortions. Think back to your data structures class. Then to the grad-level data structures class. Most of those structures, SQL will handle fine, but ORMs won't.
SQL has also turned out to be surprisingly resilient to different programming paradigms. SQL came out just around a half-century ago, before OO was common, and did fine with OO, functional, structured, and a whole range of other paradigms which have moved into and out of vogue over that time. ORMs, as the name implies, are specific to OO.
A few posts up, the poster is correct that there are design patterns around ORMs which work well, and design patterns around SQL which work well. You want to pick one and stick to it. The mess comes in when you mix layers of abstractions. One or two SQL procedures might not kill you, but when you have big chunks of code using ORM and big chunks NOT using ORM, you'll crash-and-burn.
Footnote: "Complex domains" is also about the database layer, and the type of complexity. Right now, I'm working on a very algorithmically complex system, but that complexity isn't in the data layer. Most of the data engineering is about moving GB of data around in realtime. All the database needs to handle are simple things like auth/auth. That's an ideal use-case for an ORM. I've also built systems with ORMs where the database layer was complex, but complex in ways which aligned well with ORMs. So this shouldn't be read as "dumb programmers who can't handle functional use ORMS."
Footnote to posterity: Should someone stumble upon this in a web search: If you grew up in OO and Java, and never ventured beyond, none of the above will make sense to you. You only learned the programming paradigm ORMs were designed for.
The intent of my third point is that you should expect a decent proportion of the operations to be handled well by the ORM such that you can reap efficiency benefits from the abstraction. If you're spending a significant amount of time jumping out of the ORM's sandpit and into custom queries, the benefit of the ORM may be limited. And to the extent that it dictates/influences your table structures, it might be a hindrance.
What does "business logic" mean to you? According to the Wikipedia definition [1], I find it very difficult to imagine SQL that does not contain business logic.
[1] "business logic or domain logic is the part of the program that encodes the real-world business rules that determine how data can be created, displayed, stored, and changed.", https://en.wikipedia.org/wiki/Business_logic
Business Logic is calculating tax, working out price, adding something to a bag, populating an order object with infomation according to some rules.
Business logic isn't pulling data from database, how to map an object to a table, or the specific SQL query to grab some data. Those are technical details. SQL should just be pulling, saving and querying from the store without much maniplation. Not changing the data.
It's the rules that a business stakeholder might be interested in creating.
When you start putting the tax calculation rules into the stored proc, thats where the problems begin.
If you create a CTE to pull the information required to perform the calulation then return that info without doing the calculation thats fine.
Let's assume there is a business requirement that an order belongs to a customer. By definition, this is business logic. If you implement this requirement by creating a foreign key constraint, then you have added business logic to the database. If there is a business requirement to calculate the sum of all wages in a department, and you do this via an aggregation in SQL, then you have processed business logic in the database.
If you are not allowed to do this kind of things, then what is left? Do you interact with your database as if it were just a bunch of VSAM files?
Tbh a lot of NoSQL databases don't use foreign key constraints. If you have a SQL system with messaging and eventual consistency you have to turn them off anyway.
So in a lot of ways, yes people use databases like they're just persistent memory with indexes.
Stored procedures are a really bad hack—in any practical sense they function as a mediocre API server which comes bundled "for free" with your SQL database server. If you need an API layer at all, it's much better to write this in a real programming language.
Just about anything you can do with stored procedures, you can do within an SQL query block in your favourite language. And when all your queries are fully parameterised, your database server can even use the same execution plan cache as "real" stored procedures.
This is an extremely niche view, ORMs save you a huge amount of time, and the days before ORMs were a real pain in the arse.
As a noob you are much more likely to introduce a massive security hole by rolling your own solution. Don't do it!
So this article is sorta true, and you should learn SQL, but for trivial queries ORMs are actually a massive time saver and really useful in maintaining a system, especially in strongly typed languages (Java/C#/etc.).
I'm currently in the process of porting api code from .net 3.2 soap where the code is a mixture of string based SQL queries and stored procedures to a .net core 3.1 webapi mvc + ef.
Let me tell you... readability and type saftey are a boons. Not a curse.
Things still get convoluted but boy the errors that can sneak into a complex sql statement are painful to debug.
As always the pendulum in this "article/rant" is swinging in the other direction ("everything was better in the past"). Also it's fricking (almost) 7 years old. That's a lifetime in software development.
> [7 years] is a lifetime in software development.
If we were talking about something like Kubernetes best practices, I'd definitely agree with this point. In this case, though, I don't see what substantially changed about ORMs or SQL in the last 7 years.
The two are not mutually exclusive. Dapper provides types safety cleanly and imo is more readable than ef. Ef is heavy, requires trading to understand and is a giant pain if you need to implement performant sql.
Not all ORMs are like hibernate and gorm.
I'm author of a lightweight ORM for Go and Scylla / Cassandra [1] so I may be biased.
In my view use of a good ORM makes your code more resilient to change, refactorings are simple and stupid mistakes largely eliminated.
A benchmark of usefulness of an ORM can be how many lines need to be changed to add a new field to a struct / class.
In a good ORM this should be about 1, it should still work be ~1 if you use a hand crafted SQL / CQL.
The worst thing that ORMs try to do is handling relations (eager / lazy loading) and handling sessions / object lifetime. This is mainly because it's impossible to have one-fit-all solution for all usecases even in one application.
I use a lot of raw sql and I also use ORMs. I don't see ORMs as being the issue the author thinks it is, bit rather that developers suck at designing the data layer. A half decent design will do as you say; it will allow the use of an ORM or hand written sql. ORMs make magic easier, and pulling that magic apart is a real pain, but so is working with triggers and stored procs. The problem exists on both sides. The solution isn't to get rid of ORMs, but to train developers in better system design.
When working directly with SQL instead of an ORM, how do you elegantly handle things like parameterizing table names or having flags that turns on or off filters?
In non-ORM codebasese I am seeing patterns where sql queries are "copy-pasted" together and I am not sure I like it.
Just imagine the following example had hundreds of lines of SQL and several optional filters, some of which could themselves be several lines long:
def get_data(table1, table1, use_subset=True):
if use_subset:
filter_query = "AND column2="subset_value"
else:
filter_query = ""
query = f"""
SELECT
*
FROM
{table1}
JOIN
{table2}
WHERE
column1="value"
{filter_query}
"""
You could use a query builder for this, rather than a full ORM. Programmatically generating SQL is a subset of what an ORM does, and it's one of the less contentious parts.
EDIT: Removed the reference to SimpleJdbcTemplate.
A good alternative (if you are using Java) is to use something like Spring's JDBCTemplate. This provides the following benefits over using Hibernate/JPA:
1. You craft the queries yourself and so there is no intermediate ORM language which you need to use.
2. It parameterises the queries so that you don't have to append strings to get to the final query.
String query = "INSERT INTO FORUMS (FORUM_ID, FORUM_NAME, FORUM_DESC) VALUES (?,?,?)";
jdbcTemplate.update(query, forum.getForumId(), forum
.getForumName(), forum.getForumDesc());
}
@Override
public Forum selectForum(int forumId) {
String query = "SELECT * FROM FORUMS WHERE FORUM_ID=?";
return jdbcTemplate.queryForObject(query, new ParameterizedRowMapper<Forum>() {
@Override
public Forum mapRow(ResultSet resultSet, int rowNum) throws SQLException {
return new Forum(resultSet.getInt("FORUM_ID"), resultSet.getString("FORUM_NAME"),
resultSet.getString("FORUM_DESC"));
}
}, forumId);
}
</code>
3. It fetches only the columns that you need it to fetch and across multiple tables (if need be) without fetching the "relationships" between tables unnecessarily.
The example given below shows how to use RowMappers and to fetch only the columns you need:
<code>
String query = "SELECT FORUM_ID, FORUM_NAME, FORUM_DESC FROM FORUMS WHERE FORUM_ID=?";
return (Forum) jdbcTemplate.queryForObject(query, new Object[] { Integer.valueOf(forumId) },
new RowMapper() {
public Object mapRow(ResultSet resultSet, int rowNum) throws SQLException {
return new Forum(resultSet.getInt("FORUM_ID"), resultSet.getString("FORUM_NAME"),
resultSet.getString("FORUM_DESC"));
}
});
This is basically how I use ORMs. Only as a query builder. Writing plain SQLs in code can be very messy and hard to debug, and this works similar to an API the author mentioned in his article. IMO ORMs can spare some duplication, but relying too much on that is quite painful.
The point of ORMs is not avoiding SQL, it is avoiding concatenating strings in order to build complex queries. In our app we have very complex query logics depending on the request params, and I can hardly imagine how dirty the code would be without the relational algebra abstraction.
> The point of ORMs is not avoiding SQL, it is avoiding concatenating strings in order to build complex queries. In our app we have very complex query logics depending on the request params, and I can hardly imagine how dirty the code would be without the relational algebra abstraction.
You're talking about query builders. ORM often come with a query builder but the goal is really about mapping object relations(1:n,1:1,n:n) to SQL queries in an automatic fashion, nothing more. today, most of the time it can easily be done with with JSON queries on most RDMS. But advanced ORM also come with a bunch of useful stuff like unity of work, caching and co...
I like ORMs for routine queries, there's a lot of stuff that is just much nicer and less tedious to write that way. It's also generally much less annoying to have some dynamic aspects in your queries with an ORM, e.g. adding different WHERE clauses depending on some parameters.
But even with relatively small and simple things you can run into issues very quickly if you don't know what kind of queries the ORM will create. ORMs are very leaky abstractions, they're useful but unless you understand SQL and understand their quirks you're likely to create some weird and monstrous queries at times.
You should know how your ORM handles related entities if you query them, there are some big footguns there with some strategies. And of course you should know how to use the ORM so that it doesn't do a "SELECT *" everywhere (which can be trickier than I'd like in some cases).
I would also not hesitate to drop to plain SQL for some cases, if your query doesn't fit neatly into the capabilities of your ORM.
I think this is where a lot of the anti-ORM crowd trip up. They start with the idea that an ORM completely abstracts away a database, show that isn't true, and then condemn ORMs. But ORMs don't do that, aren't advertised as doing that, and aren't useless because they don't do that.
Yes and no. I find that using leaky abstractions is often harmful as a general principle.
So it doesn't matter that ORMs admit that they are a leaky abstraction. I still pretty much hate them for it.
It just feels pointless when I know I'm going to have to understand and use the underlying tech anyway. For ORMs specifically, I much rather use a query builder that's basically just a type-checked SQL DSL. Some of those aren't complete either, though, which also enrages me.
I would like developers to learn SQL first before picking ORM to actually understand what the ORM is doing for them. Sure they'll be doing their share of sub-optimal subqueries and head bumping trying to write group by queries. But in the end I think this would make them better developers who knew the underlying software and application logic, not just the API of the ORM.
Yet we don't really live in that kind of ideal world and there are many factors that people have to consider. If everybody else in the company is using ORM, why shouldn't you. Or, if you just need to ship products and don't necessarily enjoy learning SQL. Or, if you don't know how to properly abstract the structured SQL queries without making it a huge mess. Saying that ORM is better/worse is like saying everybody likes music.
If I'm building a Nodejs app I would choose not to use ORM. For other languages and frameworks I'd probably reconsider and evaluate my options.
To me however, the biggest benefit of using no ORM is the learning of SQL. Time and time again that proves itself so invaluable, and I'm immensely happy I've learnt the quirks of SQL instead of some language specific API. If I can code faster with ORM that's great, but in ideal world I would much rather learn SQL and become a master of it rather than of some ORM.
Everything else that does more than mapping SQL results 1-1 to a query result and maybeeeee helps you with atomic actions like INSERT, UPDATE and DELETE ends up being a chore. They all have their own quirks like around threading and usually end up forcing you to design your application around it unless you take some measures to isolate it from your main application, which means you're now writing code while you wanted to type less code.
Add to insult a ton of the them seem really resistant to map the results of hand-written queries to an object. Also SQL dependency trees and object dependency trees never really seem to map that well. And don't try to do more complex queries with the built in query language in Django or .Net. I have sometimes spent hours trying to get an SQL query working in an optimal way that would not cost me more than 5 minutes two write and map with simpler ORM's.
The other day I talked to a dev that was otherwise a very strong iOS developer in a discussion why I always used SQLite directly with a simple wrapper that said "I really should learn SQL sometimes".
There probably hasn't been a skill more universally useful in my career as a developer than SQL apart from HTTP? But it also made me think about why I never really learned to write C (I can read and futz with it).
For anybody new in the trade reading this, the following things have been or could have been useful at any step in my career:
* HTTP
* HTML
* SQL
* JavaScript (blegh but true)
* C
* Unix
The rest were just passing by for a while, like Windows, Flash, PHP or any god damn JS framework out there.
My project has no ORM, and now that other developers are coming on it is hard to work with - the readability of an ORM far surpasses writing your own queries and I can see that the benefits far outweigh the cons.
And what's not to say you can't write your own lightweight ORM to abstract the database if you can't find a tool that suits your goals - and as always it all depends on your use case.
Even with documentation, it is a lot to consume - someone on the project for a longer time can get more done definitely, but bringing on more people will lead to wildly different quality control. I find it do be a scaling problem if anything.
Perfect example of the trough of disillusionment. The ORM slope of enlightenment comes when you realise the power of the unit of work, not just a single query. But that was then, and microservices mean we no longer use large units of work. So ORMs without complex UoW make less sense.
This is I think primarily related to transaction boundaries and tracking changes. You can read https://www.martinfowler.com/eaaCatalog/unitOfWork.html as a starting reference. I haven't heard this phrase much lately to be honest. ;)
In 'typical' Java application (since many comments here as well as the original article mentions Hibernate...) you will likely use annotations like @Transactional to mark your transaction boundaries (likely with default propagation and isolation levels...) and then Hibernate will track any changes ('dirty checking') to objects you asked him to fetch and then at the end the transaction Hibernate will issue whatever DML commands (INSERT, UPDATE, DELETE) needs to be issued in an appropriate order.
In a galaxy far, far away i.e. before Java 1.5 instead using @Transactional you would maybe use (write) some object like TransactionManager which provides an execute() method that receives a block of code in a form of a interface implementation (no closures for you!). This part is relatively straightforward. Tracking changes in any semi-automatic way was always messy...
Instead of polarizing between massive native queries in ORM strings vs DB-specific imperative stored procedures I would suggest a middle ground which are table VIEWs [0]. They are trivial to design (just SELECTs), stay declarative like tables, can be materialized for efficiency [1]. Finally they can be easily handled in ORMs as both static and dynamic entities, with all read-only benefits, with no need of any non-standard mapping or native queries.
This is an excellent point. I don't get the polarised nature of the debate to be honest - there is room for all approaches in a "best tool for the job" way. I use EF Core a lot in my work, and it's the first tool I reach for, and if I need a view, then I use a view (EF Core has fantastic first-class support for these nowadays). For everything else there is Dapper.
I think a large part of the problem here is that developers who learn to leverage a database only through an ORM are missing out, and really they should also learn SQL (literally the only part of the article that is still objectively correct is the author's advice to learn SQL) to gain a better understanding of when (and when not) to use an ORM. Every other complaint in the article is either classic misuse of the ORM, or else a shortcoming of the ORM in question.
So, I'm typically not a lover of ORMs as they're typically implemented, and tend to also favor doing something close to the sproc-based method that Philip-J-Fry proposes in another comment.
It's always struck me as odd that, as a profession, we generally agree with, and can be quite fanatical about, the idea that different modules and services should try to hide their implementation details as much as possible, and instead speak over well-defined, constrained protocols such as APIs or interfaces; but as soon as an SQL database comes into the mix, we happily throw all that hard-earned discipline out the window and go back to directly swizzling the internal state of external collaborators.
That said, there are some use cases where I'm not sure how you get around something like an ORM. One is when you need to allow users to execute arbitrary searches against the data. If that's your situation, then, any way you cut it, you're going to end up with some system that takes an abstract representation of a query and compiles it to SQL. The only question is if you want to use something off-the-shelf to do it, or if you'd rather hack it together yourself. In my experience, there are few things on this earth that present a greater maintenance burden than a homegrown ORM-type library does once the original author has moved on to other things.
And there are others where avoiding an ORM is over-engineering. If your database is just an honest entity store, and you're just doing fairly straightforward CRUD operations against it, and it belongs to a single application, go ahead and punch the easy button and have a happy life.
> different modules and services should try to hide their implementation details as much as possible
> but as soon as an SQL database comes into the mix, we happily throw all that hard-earned discipline out the window
The data belongs to the database, not to whatever program you happen to work on at the moment.
IMO, the impedance mismatch situation is a result of poor programming language design. Saying that the database abstraction "leaks" into your code is backwards; the shortcomings of your programming language "leak" into how you're handling your data.
I tend to blame the impedance mismatch more on the popularity of "entities" as a way of doing domain modeling in object-oriented programming.
It's far from the only way to do domain modeling in an object-oriented language. It's not necessarily even a particularly good way, and it creates problems even if your app comes nowhere near a database.
For me the usefulness of a ORM depends a lot on the language I'm working with. In the .NET world the choice is easy, Linq and EntityFramework is a killer combination. It's so much faster, and better than writing your own queries. I don't feel any of the normal arguments I hear against ORM's apply to the Linq + EF combo. For Python and JS, I kinda can releate to some of the common arguments, but still feel that ORM's have a use.
I'd really like some sort of macro that takes my random SQL query (with arguments), looks at the names and types of the returned columns, constructs some sort of struct/pojo/whatever to match, and gives me a function(my, args) -> array<that>.
Application code have one way of describing data and their relationships. RDBMSes have a very different way of describing data and their relationships. At some point, you are going to have to reconcile the differences between the two worlds - this is the so-called "object-relational impedence mismatch".
Unfortunately, even if you choose to reject ORMs and go SQL, you're still going to have solve this problem at some point, and it will not be pleasant.
If you half-ass using an ORM then you get the worst of both worlds. To get value out of it you have to embrace it completely: write all the queries in the ORM (so that the ORM's caching functionality etc. can actually work for you), define the schema in the ORM and generate the tables and migrations from that. Working with an ORM and avoiding SQL is not just doable, it's easy, but you have to actually try.
I think the most important part is to create your database schema through your ORM. I used to do that ages ago with Ruby on Rails, and that worked very well. We still had our complex reporting queries in SQL, and that was no problem. The point is that if you use an ORM, your database needs to be optimised for that ORM. Bolting an ORM onto an existing database is a recipe for disaster.
I learned to use the Django ORM in depth, but there were some things it couldn't do at the time (it may have improved). Extra conditions on joins. Subqueries, conditional aggregates (it does those now). Once you get to a certain level of complexity you can shoehorn the query into the ORM,but I am not sure it makes sense. The next developer is more likely to know SQL well than the Djnago ORM.
It depends. ORMs are for dealing with the objects and schema, not with aggregate reports. I still use raw SQL for that. But more often you can just project your data into some BI system and let the users click together their reports themselves.
Either figure out how to express the queries you need within your ORM, or create a place where you can run ad-hoc SQL reporting queries that's completely segregated from your live data - e.g. a read-only replica, or regular database dumps.
The biggest “aha” moment for me was when I started working with graphql.
The tooling would statically analyze your queries, and produce types for exactly what you were requesting.
Then you would just have a bunch of raw queries laying around, but you’d be confident the data retuned is the right data at compile time.
That made just writing raw queries a lot more simple and feasible than building abstractions on top of them.
Now TS and sql doesn’t really have a robust lib for that I think, last time I checked it was only https://github.com/adelsz/pgtyped
But haven’t looked at how far it has progressed.
it's probably a thousand pages. likely more. i understand it isn't the most thrilling reading of your life. it might just be once you have an issue in prod, though.
if you really don't have time for that, i also understand. i was you. but really please read the table of contents in that case.
yeah, no... ORMs are A LOT more than just mashing strings together to perform a query.
they are there to take care of all of the crap that you don't want or need to worry about. for instance, parametrizing string against sql injections, handling transactions properly, concatenating joins, proper pagination and sql syntax discrepancies to name a few. and let's not to forget to mention things that go BEYOND the database interactions that are built on top of the ORM like data validations, custom properties, callbacks and so on.
there are way more benefits to using an ORM than not using one. a good ORM (like ActiveRecord) will let you break out of it and write raw sql when you need the perform boost while preventing yourself from shooting yourself in the foot.
* i've written and contributed to a couple of ORMs in my lifetime
Like everything else, use the tool but understand the inner workings too. E.g. Hibernate allows logging of the SQLs being executed so when I make changes, I review that the SQL being executed is as expected + performance is acceptable. From that point of view, understanding SQL performance is important. However, I don't really want to write boilerplate SQLs when the ORMs can basically form the query for me using query method name - that is super convenient and I don't want to give that up.
Selecting * from a table with 200 columns isn't performant but if the table had 200 columns, I won't be blindly relying on the ORM generated query anyway since most provide the escape hatch of executing direct SQL.
I'll use an ORM until I start running into performance bottlenecks. Then I'll write raw SQL for those specific instances, and continue to use the ORM elsewhere.
The "Vietnam" article referenced in the post makes some very valid points about ORMs and their marriage to Relational data.
When using Object Oriented Languages, like C# / Java / C++ etc., the programmer is forced to represent data into objects. True, representing table and relations between tables as objects is difficult, but do OO programmers have any choice at all?
Afterall, the issue is not a single table (a row can very well be represented in OO constructs), it's the relation between tables that causes headaches.
Even when using languages like Python, I have to inevitable gather related pieces of data together, in a dict say, and it still feels like mapping related tables to objects.
Another contention with the "vietnam" article is that the author assumes that OO programmers will want to use inheritance to model relations between tables. This seems outdated in my view. Composition is more suitable and powerful and is almost always used in ORMs, for ex Entity Framework.
The original post also claims that ORMs tend to gravitate towards "select *" queries. This is true, but has many solutions. Many ORMs optimize queries based on required columns. But there is also another issue with this. Say you have a table with 100 columns, but are only querying 5 at a time, isn't it prudent to separate out those 5 columns into another table? Or perhaps create a view? If the use case demands just 5 columns, but the table has 100, isn't there a problem with the data modeling rather than the ORM?
ORMs thought me to build my own ORM, I even ported the thing to MySQL, Oracle and Postgres! It's available to try if you use my own "wordpress": https://github.com/tinspin/sprout
All of this is very complex for something I later solved with JSON files over HTTP instead: http://root.rupy.se but parts of it can be reused for other useful things; like logic was used to build game dialogue trees.
ORMs have their place but they are IMHO overused and frequently fail for reasons that have to do with the good old object impedance mismatch; which is a pitfall that lots of junior developers fall into where they over engineer their database schema to make it resemble some platonic ideal of some class hieararchy. When that hierarchy inevitably starts changing, the schema erodes along with it.
Some symptoms that I've seen in multiple projects:
- requests are slow because every request triggers dozens to hundreds of joins.
- overuse of the @Transactional annotation (spring/hibernate) because of developers slapping it on anything that looks like it might be doing anything with a database while neither understanding transactional semantics or aspect oriented programming (which causes some funny behavior)
- Attempts to implement class inheritance via database tables and corresponding hacks and complexity to query because of that.
- Lack of a coherent database design. IMHO, a good use of ORM should start with a good old database design. A 1 to 1 mapping of your domain to tables is typically not it.
- Over and under use of database constraints, indices, etc. because of a lack of knowledge of how databases actually work resulting in unenforced referential integrity constraints, poor performance, and weak transactional semantics.
I've used lots of different styles of databases over the years. I generally break things down into:
- simple key value stores like redis, memcached, etc.
- document databases like couchdb, elasticsearch. These tend to have schemas and fields
- Nosql object databases (mongodb, firestore, etc.
I tend to mix these styles and would happily use postgres as a document store. IMHO if I'm not going to query on it, putting it in separate column adds little or no value. If I am going to query on it; it should have an index.
Years of using document stores have taught me the value of de-normalizing stuff like user names and other things that rarely change but are a part of pretty much everything. I've ripped out hibernate in favor of JDBCTemplate + TransactionTemplate on several projects where hibernate was causing more problems than it solved. Hand crafted joins are pretty easy to do.
I've seen this post and similar ones many times. ORMs don't fit all scenarios, but most of the time it's a question of when you fall back to something more flexible/powerful. In Rails, writing the crud part of your app with SQL is pointless. Trying to get AR to do dashboard gymnastics is equally pointless. Pick the best tool for the job.
I'm not an expert in databases, but I am starting to think that building an interface primarily designed for humans (that's what SQL is) as the main medium to interact with the database maybe in retrospective was not a winning idea.
I'm not criticising anybody here (I am literally not in a position of critiquing anything); SQL was created probably 40 years ago and it probably made sense back in the times.
My point is that you can always build a human interface from a machine one; the reverse is not that easy. ORMs are a (failed?) attempt to do so.
I would very much push for computer-language integrated query support — (such as Datalog or Linq), and that is what the database server should be accepting as input, instead of a raw string.
Although a lot of people hate MongoDB, I've personally felt way more productive expressing queries with their query document system rather than using SQL.
Fixing design flaws with other software ain't gonna bring us that far maybe.
Re: identities. Use of db generated identity has the downside, but n/hibernate has a bunch of other Id generators to mitigate the problems. You can use sequential guid, hilo, guid, or whatever. I use sequential guid because it helps with a bunch of other things. So it's not really a leaky abstraction. It's not really an orm problem, you need to do that regardless.
Re queries: I think linq showed the true power of orm in some sense. Query your database as if you are querying your objects. It has problems like n+1 or exploding cartesian but postgres and the likes fixes it nicely with json_agg. I have to agree though that I like graphql way too. For nodejs compile time linq is not an option, so graphql it is.
Transactions: I don't know how this is related to orms?
Disclaimer: former nhibernate developer that was around when ayende was building linq for nhibernate and my view is probably dated and biased.
I think you misunderstand the identity issue. I think its more a complaint about mutating objects not cleanly mapping to running UPDATE commands. Live objects and partial updates are messy and dirty writes become an issue. I often see a a pattern of a Save() method on an ORM type and its ambiguous as to what exactly that will write to the DB at any time. Maybe there are better ORM patterns but the problem is its a common anti-pattern none the less.
If you use a more RPC approach to your data layer, partial updates based on an entity id are easier to groc.
"When you have foreign keys, you refer to related identities with an identifier."
"What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier."
This(and the sentences before it) basically saying if you have if you have a foreign key you have to first save the main object to get it's id. Id doesn't have to be responsibility of the database. In fact, I'd argue an identifier is an application layer concern. Orm could still solve this problem by simply cascading, or you can use one of the identity strategies that don't rely on database assigning it. You can generate them in either application layer or orm layer through identity generators.
Let me be more clear. Partial objects, such as a book with an author created from a user request may or may not have DB ids that may or may not exist. If you want to add a Book object with an Author you must first upsert the Author to possibly retrieve the DB id which already exists and cannot simply be generated locally. This can often be a bit confusing or cumbersome compared to the SQL.
Why are you caring about DB ids if the ORM is supposed to handle it, asks OP.
My main reason to use ORM and why I recommend to others to use it (specially if they are juniors) is because it usually (depending on the language/framework) comes with great support for protecting against SQL Injection.
Of course not all ORMs will protect against SQL Injection completely and some of them will only implement partial protection. But it is a good start.
What I recommend strongly is not to mix ORM with plain SQL in the same line of code.
The code should either be full ORM or plain SQL so that it is clear where the responsibility of protecting against SQL injection should be.
Edit:
Also it is easier to add extra layers of protection after a while if using an ORM then when using plain SQL. Because with an ORM I can redefine lets say the Repo.all or Query.all method to add extra layers of protection. But with plain SQL I need to edit every single line where the SQL statement is present.
I understand the usefulness of an ORM and how much its use can clean up a codebase but... truly performant queries (that take advantage of every aspect of your DB not just some vanilla lowest common denominator provided by an ORM abstraction) are done in SQL. In the end one has to make the tradeoffs between development speed (something an ORM has going for it) and performance and the eventual issue of having to handle the object impedence mismatch. For me it's just so much more clear to read a handful of queries and see how things work. You can even create user defined functions and call them like an API from your codebase (a sort of ORM) and get the best of both worlds. It's not a easy choice to ORM or not but SQL should be a skill everyone learns.
well, most ORMs give you the ability to drill down into native queries if you want, and often even will help you with the conversion into native objects if what you are pulling back is still a native object. So it's not a binary thing, you can use ORMs to reduce the amount of boilerplate for simple queries and then where it makes sense you can drill down into native SQL.
(a lot of what ORMs give you is just a reduction in boilerplate code, manually populating 30 different fields on an object and so on.)
Most simple queries won't benefit much from hand-coded SQL and it's always there for the minority of the ones that do.
And again, there is the middle ground of writing something like custom HQL that returns an object(s) of interest.
Again, there is a huge amount of error-prone boilerplate that is avoided simply by doing that, the next time you add a column you don't have to chase down 27 different hand-coded functions manually populating one field at a time.
> I work on the principle that the database’s data definitions aren’t things you should manipulate in the application. Instead, manipulate the results of queries.
Following this principle will definitely make your life easier. I have been using Slick[1] and the Play Framework[2] for my backends for a good while and having the freedom to define my migration scripts in pure SQL is a big plus.
My application models are very lean and don't get mixed up with DDL. I can also fully understand what goes in the db migrations, zero magic there.
My preference is currently ORM + drop back to SQL when you need something that is complicated or inefficient using the ORM. I have been writing SQL for 20 years.
I currently use dotNet Core and Entity Framework + Dapper.
It has these benefits:
- Strong types!
- My schema+migrations are part of the source code and I never have to worry about if my schema is in sync - with migrations it just is
- Simple, straightforward tasks as super easy (insert, update, delete, simple select)
- I can fall back to Dapper (straight SQL) when I need to do something that doesnt fit well with Entity Framework.
I recently worked on a project where EF was not allowed - and we spent so much time doing all the simple stuff, especially when the data model was still not completely locked down.
To be fair, I have never really done a full blown project with entity framework.. but I seem to always need a bit more control than it's offering. So I just created a simple repository framework (inspired by https://www.youtube.com/watch?v=rtXpYpZdOzM ) overlay in which the functions end up calling dapper.
The beauty of this is it's pretty simple to setup the overlay, and allows complete control of what is going on. I have a need for "dynamic" table names. I don't think there is even a way to do this in EF (at least not a simple way), but when I have my own repo overlay it's really a piece of cake. I just tack on whatever specific function input I need to help define what the resulting table name is that its great.
This is probably not the most common use, but a simple repo overlay to a micro orm is not much work and gives full control.
I'll put it in the mix of ideas that the problem isn't ORMs, it is trying to build ORMs on top of SQL.
Languages could be placed on a spectrum of how easy it is to build a DSL over the top of the language. Lisp is on one extreme, where it is easy to write a new language over the top of it.
SQL is pretty close to the other end of the spectrum. I've never seen a DSL that can compile to the full breadth of SQL dialects out there. DSLs generally handle the trivial well then fall apart when they hit complicated SQL statements.
If an ORM just had to fit over the top of a relational model it would probably work fine. The problem is, in the middle of all this, something has to be constructed in performant & parseable SQL.
Whether or not you use an ORM, you ideally have your own abstraction on top of it, defining a set of known queries. This avoids the classic Rails problem where the User model has a million methods, and you don't know which of them might make a SQL query.
Also, ORM caching always finds a way to become a huge pain. It's easy to wind up with multiple objects representing the same underlying DB row, but updating one in memory won't update the others, so you update them all from DB just to be safe... You might say "find a better ORM" but I'm pretty sure they all run into this problem at some point. IMO you're better off doing it yourself and making it explicit.
I have been using ecto for some time now, and its approach is very good. It is just some syntax sugar to help to write safe SQL from elixir. There is no magic like cache or anything from complex ORM and it has been the best experience in many years.
What I found to work really well (esp. with GraphQL / Dataloaders) is using something like postloader by gajus [1]. It generates a slim interface from the database schema. For the simple run off the mill things you get an easy interface to get certain rows of a table or load related data, which is a huge part of what you will need when writing GraphQL resolvers. We extended the idea to generate simple wrappers for creating and updating tables as well, if anyone is interested in that I may dump the code for that in some gist.
You start using an orm when you're basically rewriting a crappier version of one.
You stop using an orm when you're wasting time going through it's documentation trying to figure out how to do something tricky for some obscure ad-hoc execution.
The title suggests that you would be better with no ORM, just SQL. I have not been coding for years - but still remember why ORMs came into being - before them there was lots and lots of repetitive code with the query preparation and execution with long ugly constant strings and string manipulation. With ORMs (and query builders) it all became much more compact (and looking better without the uppercase SQL). As every programming abstraction it is not perfect and you still probably need to learn SQL if you use ORM - so you need to learn both.
SQLAlchemy is pretty sweet in the sense that you don't have to use the ORM, it has a non-ORM part that lets you write almost arbitrary SQL statements programmatically. This often results in cleaner code than when writing those queries by hand (which you can also do in SQLAlchemy), so I prefer it to raw SQL. For things like database migrations I prefer raw SQL files though, as it's a bit painful to get edge cases right with tools like Alembic (haven't used if for two years though so maybe things got better).
My experience is that most developers are really bad with SQL. Because of that, they would rather use an ORM because they think they can avoid internalising the relational paradigm. But, guess what - if you don't speak relational then your table design will suck, which in turn will impact DB performance.
Given a long enough time, you'll either switch jobs before this becomes a problem or you'll be forced to learn SQL properly. Most developers just switch jobs.
I'm an advocate of lightweight ORMs, in C# world, I've used Dapper a lot, and now using RepoDB. Lightweight means I can easily craft SQL and quickly map it to types, and things like CRUD on simple entities is very straightforward. But I generally agree with learning SQL, and avoid mapping it to types if you are just going to pipe into json for the front end. SQL is good for query and projecting data into new forms, complete waste going through a type if not needed
What I don't see a lot of mention here is database selection. If you really need an Object->? mapping, why not just use a document store or the like? I don't know how many remember but in the earlier days of Java everyone was talking about OO-databases. I think they didn't work out because they tended to be language specific but now you could serialise your object tree to JSON and store it in a document store if you wanted to.
I’ve found a lot of programmer friends are apprehensive about SQL. I’m not sure why, but they see it as a wizard language instead of a friendly DSL like I do.
The development speed with ORM's is ridiculous fast and easy on the mind.
I do agree that for large, complex performant queries, ORMs might not work out, but I would say that those queries are better written as SQL procedures / functions because they are bound to be modified or changed anyway.
Having said that, most LOB apps, small apps, apps that have in-frequent access of the DB, etc, can benefit from using ORMs.
I can never really decide the answer to this question and I have been programming a long time. I end up just not thinking about it too much and doing what I feel like because getting things done is more important than standing around thinking about the millions of different options.
There are various arguments that seem to have flaws of their own. For example, "use ORMs for all the simple things, and raw SQL for the complex things". The problem with this is that even simple things like "insert this into the database" often require checking that the foreign keys you insert are within the domain of the user trying to insert them, and various other constraints like that. So with SQL, you can do this all in one query, but with an ORM, you often create a query for each check.
Related to that is the idea of "premature optimisation". The problem with this concept is that all queries add up together to determine how much hardware you need, which determines your costs. You can argue the opposite of premature optimisation, depending on your case. Why not spend 5 minutes extra writing a manual query that will be run hundreds of thousands of times for years?
Are your goals about reducing infrastructure costs as much as possible? Is "developer time" really a thing, or are you doing this in your own time for a startup or something?
Then there is the fact that doing simple things faster isn't much of a selling point because they are already simple. It is very obvious though that ORMs are much, much more readable than raw SQL.
Then another question starts to come up, if you are using this abstraction like an ORM or GraphQL as an ORM, why are you even using a SQL database when none of the features are really available to you?
SQL has has interesting new features in that you can do the object mapping inside the queries now. For example:
select
post.id,
post.content,
json_agg(comments),
json_build_object('id', a.id, 'name', a.name) as "author"
from
posts,
comments,
authors
group by
etc...
Still, it is nowhere near as easy as using an ORM. The other thing is that when you start using an ORM, you really do end up having a different approach to querying your data in every situation. You don't use all of the various features of SQL. It ends up being a lot slower, but maybe in some cases that is worth it.
Yes ORMs are terrible but there is a middle ground required. Something to nicely pool connections and handle all the boilerplate involved in binding parameters to prepared statements. I normally roll my own but would love to find a good library that does it all.
One problem I've yet to find a good solution to is automatic reconnect on database errors with proper transaction support.
Whilst I have long railed against ORMs, I do appreciate `pg-orm` (and for my own stuff, `crud`) in Go because it means I don't have to write Nx100 (177 distinct structs at $WORK) versions of "scan the results into this struct type-safely".
But that's less of an ORM and more of a data mapper?
The sqlc lib "generates fully type-safe idiomatic Go code from SQL". It makes sense to generate code from SQL and not the other way around.
https://github.com/kyleconroy/sqlc
Just use a mature ORM where you can easily combine ORM capabilities and raw SQL, like Hibernate / Spring Data. There you can do raw-raw calls to the SQL driver, map your optimized by hand query to an object or let Hibernate fetch a whole graph of stuff by itself.
Rails with Arel is really special, the queries it creates are really impressive coming out of an ORM. I would be curious to hear an updated version of this critique which focuses on Arel.
Now of course learning SQL is still a good thing to do, but for some a good ORM can be a way of doing that
ORMs should not be sold as "Now you don't need to learn SQL". Like any abstraction, it pays to know about the layer below even if you're being saved from directly using it. In the case of SQL and ORMs, this is particularly true.
I couldn't agree more. We all should just learn and use SQL. In case it really really really (because avoid multiplying standards) doesn't fit a significant portion of the real life tasks well we should just design a new SQL.
yeah, the thing is that at the beginning you just don't know what you'll need later. so sticking to anything strict will become a limitation sooner rather than later. it is pain to do the manual work in every project over and over again but "this is the way", unfortunately.
i have done two event-sourced project in a row and i am trying to make a reusable library that i can use from now on, which will allow me to write projections so i am not tied to any schema and can build any data structures i'll need.
That is such no discussion.. and articles from 2014 are not having any valid arguments anymore.
I put it along the discussions about which OS is the best. Which does not matter at all. Please stop wasting everyone time with those arguments. Do what is best for your project and shut the f up.
The reason this is a good discussion to have is because, at a superficial level, it looks like ORM's should be faster than writing SQL, but when you get down into the details you find that the complexities are a lot deeper than a superficial view suggests. "Do what is best for your project" is good advice, and the author states a good case for ORM's not being "what's best for your project".
I'm a big advocate of understand your data model at the database level. Need to join on too many tables is too easy to do with an ORM.
My go-to strategy for SQL is simple. Abstract your SQL as far away from your application as possible. Not just by using interfaces in your application, but by using stored procedures in your database.
In my mind SQL is just another service your application depends on. And you want that interface to be as simple as possible for your application. If you need to tweak SQL performance it should not need any input from your application. I could completely migrate to a new table relationship structure without the application realising if that was what was needed. You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.
Yes, if you need to return new data then you need to update your stored proc and code. But that's so worth it in my opinion for that extra layer of abstraction.
My opinion is slightly skewed from a decently sized business perspective, but I do still follow this pattern in personal projects. When migrating applications to different tech stacks (like Java to Go, or C# to Go) this abstraction has meant the world to us.