I have a lot of experience with the Django ORM, but no other.
So I disagree with most points.
The Django ORM does map pretty neatly to SQL (in my opinion), and especially Postgres. The migration tool is anything but "half-baked". And I have observed that beginners have an easier time creating Django models than dealing with straight SQL.
On top of that, the Django ORM mitigates most SQL injection attacks out of the box, without even requiring the developer to know about SQL injections in the first place.
And the ORM enables elegant code reuse by constructing query objects that can be used in Template logic, Form validation, generic CRUD, REST endpoints and more.
I agree that most ORMs don't help you with multiple programming languages. Though both the Django ORM and SQLAlchemy allow to customize a lot of the naming conventions and can be adapted to any legacy schema.
But then again: Is it even a good idea to access the same database from two codebases at all?
Some of the concerns about "ORMs kill the SQL-star" can be boiled down to a lack of a general understanding of relational databases in developers who learn to use an ORM before diving deeper into the philosophy of relational databases.
...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it, but:
Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.
For, it's always been a huge footgun, coupled with Python's dynamic nature, it's and endless source of bug after bug... Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost. Use progres native json and array fields - good luck running your tests on sqlite, the ORM that was helpful up to here just gives up and tells you "f u buddy, you're on your own". Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.
They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...
While I like the idea of ORMs, I found them all totally "wrong headed" And SQLAlchemy - that's just not worth the effort of learning to use it properly despite its good ideas, if you're only going to end up using 10% of its functionality and in an idiosyncratic way that was not intended.
This thread is littered with people singing praise of Django's ORM. He is not the only one. I'm a huge proponent of it as well.
> Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.
The official documentation covers advanced use cases. Stackoverflow will help you shape up more complicated queries.
> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.
That's a drawback of most dynamic languages, not a flaw of Django's ORM.
> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.
Maybe that's the threshold Django ORM devs thought sensible to stop supporting? At some point, mixing hand-coded-SQL and ORM code introduce a whole set of hard questions and decisions. Very few people expect ORM code to properly play with hand-coded-SQL. If you need to switch back to SQL, you are generally on your own.
> They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...
It's simple to use, yet have complex internals. That's about what I would expect from a tool abstracting something complicated.
>> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.
> That's a drawback of most dynamic languages, not a flaw of Django's ORM.
In Python, mistyping a name usually results in an exception at runtime. If Django doesn't check models for correctness, it can't be explained by being written in a dynamic language.
In Python, mistyping a property of an object doesn't result in an exception. JS have the same behavior.
If you mistype a property on a nullabe or already filled property of a model instance, Django will happily save the model with the mistyped property (which will be ignored because it's not part of the model) without actually modifying anything in database.
Calling that behavior a flaw of Django's ORM is disingenuous.
> In Python, mistyping a property of an object doesn't result in an exception.
Sure it does. It doesn't on assignment, as you can modify the object however you like, but it does have AttributeError for accessing a property that doesn't exist.
Which you can only use compile time - not dynamically. That said I recall the sqlalchemy orm reject fields that it did not recognize. A decent IDE (pycharm) will hilight these for you on the Django orm. Having now used both the Django orm and SQLAlchemy extensively I prefer SQLAlchemy a lot. The Django ORM is usable, but many of its behaviours are odd and I agree that writing more complex queries gets ugly fast, unlike with SQLA where you can write some really hairy sql in a composable manner that makes it just so much more legible.
There is no such distinction between compile time and run time in Python; everything can be done dynamically. Want to use a class with slots computed at runtime? No problem:
>>> def with_slots(**kwargs):
... class SlottedClass(object):
... __slots__ = kwargs.keys()
... def __init__(self, **kwargs):
... for k, v in kwargs.items():
... setattr(self, k, v)
... return SlottedClass(**kwargs)
...
>>> with_slots(foo=1, bar='qux').zork = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SlottedClass' object has no attribute 'zork'
Of course there are many improvements that could be made to this approach (e.g. not creating a new class for each instance, or setting the __name__ to something meaningful) but I hope it shows that dynamic typing and robust error checking aren't contradictory.
> ...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it,
No, he's not. Almost the entire Django community has a good opinion of it.
The main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy. They have different design philosophys and the distinction seems largely a matter of taste and differing use-cases.
> main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy
...then I'm in the minority of hating them both, and in the process of looking for an alternative since I'd be using a lot of python too for the foreseable future :|
I liked web.py's database abstraction layer a long time ago - know of anything similar in philosophy but more powerful?
DAL was the first Python db abstraction I used, and I liked it well enough, but I've since used both Django and SQLAlchemy, and they both seem less opaque to me, maybe because not obscured by some of the weird things about web2py itself.
I love the auto-generated backwards references you get in Django and SA when you use foreign key and many to many fields. They make life so much easier. I don't recall how that was achieved in DAL, I just remember it being... complicated.
Years ago I had the same discussion [0] on Medium and here is my take on that: You will end up reinventing "an ORM" because your SQL won't handle all the edges cases. At least with eg. Django ORM we know it helps avoid some security flaws.
thanks for the reference, especially to the Doordash's original article "Tips for Building High-Quality Django Apps at Scale" that was related to... I really like their thinking and think they are solving for the same problems, really like their idea of abstracting/wrapping over the ORM instead of using signals, and I'm looking for a similar architecture for a "default" Django boilerplate, featuring:
- data models in separate files one for each model, under a data_models/
- business logic models that may or may not map one to one with data models under a models/ that would content roughly framework-independent code
- encourage "fat apps"
- figure out some way to let the db be the source of truth and allow "multiple services, one DB" scenarios - maybe having "core models" for which all db migrations are excluded, table names fixe, and possibly extra columns that the django app could ignore but other services could use, and "app/project models" for which regular django migrations would run
I might write it up after I get the boilerplate app, and I'm really interested in hearing opinions for people with opposing viewpoints especially!
...of course they don't: I was referring to having a sane standardized way of having each app work with two dbs, one that it owns and includes some models of no interest to other apps, a different one that it shares with other apps and is used by other models (and has migrations turned off for these models, ofc).
The main point is that it can result way less code, think of all the APIs between microservices that would never need to be developed or documented or maintained because they'll never exist, yet the problem they would've solved will be addressed :)
Doesn't work all the time, but if most of what you need is data-sharing, and there's a clear model of "this service produces this data, which has a clear documented fixed format, all others just consume it", you'll prevent thousands of LOCs from being written. And I do see my mission more and more as preventing (useless) code from being written nowadays...
Indeed. I recently worked on a legacy project where I don't have the luxury to modify the old database. I ended up using Peewee [0] to access the old database, in conjunction with the normal Django based database and even wrote a simple model that use Peewee [1]. Everything works, even Django Rest Framework associated with the model.
Django's ORM is fine. I would say that it's good but not particularly powerful.
Its easy to use and makes inserting / updating data fast and easy. It's simple and fast for simple queries.
It starts getting a bit crap when you need complex queries. Annotations / GROUP BY stuff isn't intuitive.
It does make it easy for inexperienced people to generate huge numbers of queries by looping through querysets and getting field from another table.
It lets you run raw SQL through it and prevents SQL injection, so its really a case of knowing when to switch to that.
Where did I say I was concatenating data? You might want to take a look at Django's ORM before making comments like that, it handles parametrised queries fine.
The way I see it, an ORM is just a way of formalising the false assumption that a database is stupid and useless.
I've taken a look at a couple of Django's most popular ORM options and I suppose if I was a mid-level corporate cog writing CRUD interfaces for Stack Overflow tutorial-grade operations like customer-sale-item-product relationships it would probably work fine. I've only had passing exposure to that kind of soul-crushing drudge work.
For the way I work an ORM is baby stuff, and I'm not even doing anything particularly advanced or sophisticated. I maintain (among other things) a relatively unremarkable bespoke online discussion forum. It comprises hundreds of queries, few of which could be composed correctly by an ORM, let alone composed and run performantly. The median complexity query in my code base probably has two or three joins and two or three subqueries—at least one of which has some kind of tangential aggregation or window function.
Thanks for the well deserved lesson in internet arguments. You win, sir.
(For the record, complexity and remarkability aren't strong correlates. And it's not particularly complex; probably 90% of the app's queries are written for indirect tasks like moderation tools, automations, statistical reports and various stratum of anti-abuse mechanisms. Most common pages only have one security-related query and one content-related query.)
> progres native json and array fields - good luck running your tests on sqlite
you're not testing what you actually run in production, then what are you testing? If you just need to test in-mem behaviour, mock the DB completely; instead of expecting compatibility for vendor specific features across vendors.
For database testing, I often use a "ramdisk", or in-memory filesystem: sudo mount -t ramfs -o size=512m ramfs /path/to/mount/point, initdb, pg_ctl start. (There's a bit more bookkeeping to generate passwords, store them to /path/to/mount/point/.pgpass, etc.) You can tear it down afterwards with pg_ctl stop, sudo umount /path/to/mount/point. I'll put the necessary commands in start / stop scripts, then call those from within the beforeAll / afterAll hooks of whatever testing framework I'm using.
It's definitely slower than mocking, and therefore not really suitable for unit tests - but it's great for repeatable, isolated integration tests (continuous or otherwise).
Hmm... can one test use in-mem db mocking with django.contrib.postgres fields?
I'll definitely look into this, not a solution for existing projects bc of legacy baggage, but probably the right way to do it. Problem in practice is that you yes what you really want is to mock the db completely, but practically speaking existing code is to entangled with the db adapter specifics, so you end up with running with a "fake real db" :|
Can you write the tests so that you create object instances and not save them to a database at all? You don't need to test the ORM and database, after all, just your code.
unit tests, yeah, but if you try to get "the most bang for the buck" in like "test a lot with zero/little effort" you'll end up with a layer of "integration tests": that (1) set stuff in db, (2) do stuff, (3) read stuff from db and assert its correctness... A sane ORM would not actively hinder this scenario even cross-dbs or with memory-mocked-dbs, but Django's does!
I don't see the issue - you're asking it to support a feature of one database on another, something it never claims to do. You need to use the same database for testing that you use in production.
I'd say the sqlite backend and using it in testing is mostly useful for reusable django apps that are supposed to be used in a wide variety of projects.
Most of the code I write in Django would never need to be SQLite-compatible, even if a lot of it would be. The SQLite compatibility is almost like a relict from a time when setting up RDBMSs was difficult and slow.
I don't know if there is already some plugin to make mypy aware of Django models, but that should be possible and would avoid bugs where you misspelled a property and it slipped through your quality control and unit-test.
I suspect "raw sql" to be a bigger headache and source of bugs in any moderately complex project.
Most of the use-cases of queries are actually very simple, and there the ORM allows you to simplify the code a lot.
My hints how to enjoy the ORM more: Write custom managers and QuerySets (for example for commonly used filters), write generic views that take QuerySets, take advantage of Models and QuerySet in Form Validation and REST APIs.
If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...
> Most of the use-cases of queries are actually very simple
I write web applications / data tools with raw SQL and I find that only about a quarter of the queries are simple. Many have subtle specificity to the joins, subqueries and window functions that return for me almost exactly what I want to output.
The problem I see in most people's ORM-powered code is that they lean so heavily on application code to get all the data assembled in the right place and in the right order, resulting in scaling problems
> concatenate raw SQL
Often dangerous and almost never needed. Parameterised SQL is the right approach.
> If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...
Depends on what exactly you mean here. The Django ORM fully supports some things you wouldn't initially expect if you hadn't seen it before, like passing one queryset into the filter args of another. It'll actually compose the queries and run it once on the database, instead of running the first and passing the result into the second.
> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.
Test your code. Use an IDE. PyCharm highlights these errors immediately.
> Use progres native json and array fields - good luck running your tests on sqlite,
You're using stuff from the Postgres contrib package on a different db. It's pretty clear in the docs about that
> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code
Make a view and manage it through the ORM? If you're updating data from this, then what you're doing seems really strange.
I'll leave the other parts since sibling comments responded to those, but:
> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.
I am the other person with an excellent opinion of Django's ORM. I vastly prefer it to writing SQL queries manually, and the only thing I miss is making it ever more functional-like.
Everybody prefers anything that does a barely decent job at preventing you from writing SQL manually :)
If you set the bar that low then sure, you'll have an "excellent opinion" of ANY ORM. Can you give an example of an ORM you didn't prefer to writing SQL manually for 50% of the cases? - most that were that bad are probably not in usage anymore...
My experience as well. I use an ORM that I absolutely love for most work (Apache Cayenne) and I use plain SQL where appropriate, like for complex reporting.
The "ORMs are bad" articles usually boil down to "My ORM is bad" or "I tried to use an ORM where it doesn't really fit and now I believe all ORMs are bad".
> Some of the concerns about "ORMs kill the SQL-star"
This doesn't even make sense. It kills the SQL duds, stars are even more important to understand the convoluted SQL when the ORM isn't working as expected, and cases where the ORM can't handle at all (there are always some).
That's not my experience. In ~20 projects in this company, with pretty complicated GIS-related web apps, there have been about 4 cases where I need to create a raw SQL query because Django's ORM didn't do what I needed. Almost all the time, it just works.
When you say "pretty complicated GIS-related web apps" does it mean you perform powerful spatial analysis using PostgGIS extension over a few millions spatial objects OR is it more about fetching a handful of spatial objects for display/processing in web-mapping framework?
Because for pretty obvious reasons in the first case you'd want computation on the back-end. Sure you can do a lot of spatial operations with python after passing through an ORM... but is this really more adapted than raw SQL & PostGIS? Genuinely curious.
Most of the processing happens outside PostGIS, we often compute things with geometries from PostGIS to raster data that we don't keep in PostGIS. So our Postgis queries are mostly limited to filtering on a BBOX, or closest distance to a point.
What users want to compute can be defined by users, e.g., "take all houses in the Netherlands as polygons, compare with a DEM of the Netherlands to find the lowest point of the house's contour, compare that point to a water depth raster to see if it would be inundated" is a computation a user could define in our GUI and the actual computation would then run in (Python) background tasks.
I work in GIS field and I tend to put a lot of business logic into PostgreSQL because I can re-use all functions/view in a variety of tool from Desktop GIS, BI and even (shamefully) Excel for some our oldest App.
Whenever we'll move to a new webmap or I we'd want to serve an API it's always one connector away. All business logic put into the DB is heavily reusable because a lot of tool "speak" SQL.
I guess in the end it really depend on wether you approach is data centric vs functionality centric.
Coming from Rails I think that Django's ORM is too complicated, like almost everything in Django and in Python. However I'm using Python, Ruby and Elixir in different projects and Ecto is much more complicated. It makes me want to write plain SQL as I did in the early 90s.
Actually I'd be OK writing SQL. It's just inconvenient having to decode the results and handle migrations manually. I did plain SQL migrations before ORMs. That's not something I want to be back to.
Django's ORM is pretty complicated under the hood, but I suspect that this is not easily avoidable given the requirements.
I also don't subscribe to the view that Django is too complicated. With other frameworks you end up implementing part of Django, and most of the time that will be worse in every conceivable way.
This. It's not harder, it's different. Takes some getting used to, but then you realize "oh it's pretty much just sql that returns structs" and you instantly hate every other ORM you've used prior.
I'm not surprised about your comment on Ecto. Ecto takes experience and practice, but by the time you finish writing your 5th or 6th application, Ecto will be so natural that you wouldn't want to go back. Especially not to writing vanilla SQL.
The problem with complicate things is that the second application never comes. I've got a customer that's been paying me to write a Phoenix application for the last two years. That's good but I'm not going to use Ecto in one of my own projects. So no Elixir if it touches a database. There are simpler alternatives to Ecto, maybe I'll use one of them but Ruby and ActiveRecord are good almost for everything.
I used to really dislike django ORMs since I had a background in SQL and thought that ORMs are really constraining and not as flexible as just writing SQLs. However, recently I have come into the realisation that not everyone who programs knows SQL (woah, what a revelation right?), and the django ORM is a really nice tool to fill that gap for them with something they are more familiar with (ie python objects).
If you are using any language other than SQL, then you better have a way to convert SQL data to the appropriate data types of your language and vice versa. And it better be injection-proof and reusable. That's already at least half of an ORM right there.
I think I know SQL fairly well, but I don't see how I could implement things like django admin, form validation, template views etc as elegantly on top of plain SQL.
So I really don't agree with "ORMs are only useful for SQL-noobs".
> But then again: Is it even a good idea to access the same database from two codebases at all?
A motivation for relational algebra was different applications accessing data in different ways. This integration remains a selling point. Further, data often outlasts applications... even languages.
Row-level permissions are possible, but it can get more complicated when certain users are only allowed to change a certain row in a certain kind of way but not in another.
"Is it even a good idea to access the same database from two codebases at all?"
Yeah, if you're migrating code between languages and platforms and want to run both simultaneously so you can go live with a subset in the new platform and phase out the legacy system in stages.
But then again: Is it even a good idea to access the same database from two codebases at all?
I think that depends entirely on who you ask. Some orgs may have dedicated DB guys who are designing all the DB schema and stored procedures that must be vetted by the DB guys before the application guys can merge in changes. Others might decide dev teams should be responsible for their own data and store it as they see fit. I think product data should only be updated by one main API and that API exposed internally for other tools to interact with, unless it's something that is special. And in development, there are many special cases.
If its treated correctly as a set of SQL language API-s with clear ownership and explicitly defined public/private parts. There is a reason why its often considered an anti-pattern, and it is a general lack of discipline in maintaining stable semantics and separation of responsibilities.
It looks like the author is generalizing shortcoming in SQLAlchemy as faults with the concept of ORMs. ActiveRecord for example, treats SQL as “the source of truth”, per his definition.
Also, there is a whole section called "ORMs take over connection management and migration" and not a single point as to why connection management handled by the ORM is bad, only comments on migrations. Moreover, migrations are not an unsolved problem at all.
In Ruby, ActiveRecord solves all scenarios I have encountered gracefully, and trust me, I have dealt with tons of edge case projects.
> ActiveRecord for example, treats SQL as “the source of truth”, per his definition.
SQLAlchemy author here. I can't imagine how someone would not consider SQLAlchemy to treat "SQL" as the "source of truth" as well. All of SQLAlchemy's constructs map directly to SQL syntactically. Our most grumpy users are the ones who don't know SQL. This blog post would certainly benefit from some actual examples. "ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?
What the blog probably means by "SQL as a source of truth" is that you first design your database and then generate the domain classes from it and not vice versa. This is for example what myBatis and JOOQ do.
I don't think there is enough information from the database schema to feed an ORM reliably. E.g., take a "boolean" in MySQL: MySQL doesn't have proper booleans, it just represents them as a `tinyint`. But you'd really like your ORM to convert those to your language's `bool`!
There is certainly a lot of information in a schema, and you definitely want your ORM's idea of the database and the database's idea of it to be in close alignment, but I feel like any serious attempt would find all sorts of holes like the above when it actually came to doing it.
Wouldn’t you just offer user-modifiable translators for such cases (presumably with some sane default)? It seems more preferable than writing the domain model in code first, resulting in N copies of the schema definition for N applications utilizing the db..
> N copies of the schema definition for N applications utilizing the db..
Is multiple applications talking to a shared DB schema a common practice, especially nowadays?
In my mind, each app/service should have a DB schema which only it talks to, and other apps/services needing data in that DB go through services exposed by that app/service (REST, RPC, GraphQL, whatever) rather than talking to its DB directly. That means you can rearchitect the DB schema, change which DB you use completely, etc., and only the app/service which owns that DB needs to be modified.
Why use an ORM when I have to know SQL and its a direct syntactical mapping?
>"ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?
Yeah...its a light ORM that focuses on turning a result set into an object but not the syntax remapping of queries. Object mapping is almost universally liked but ORMs usually include query syntax mappings and not the addition of a transaction lifecycle into your data objects.
because an ORM has nothing to do with writing your SQL for you. You can use textual SQL with an ORM and an ORM like SQLAlchemy has a query language that mirrors the structure of SQL in any case. nobody is taking your SQL away. Additionally, you most certainly do want a tool that will write most of your DML for you, there is no deep wisdom or joy in writing the same INSERT / UPDATE statement over and over again.
I've used SQLAlchemy a bunch and found that it has excellent support for complex (for me anyways) query options. For example, avoiding N+1 selects with eager joins has been a breeze.
ActiveRecord does not treat SQL as "the source of truth". ActiveRecord objects maintain their own state, and still try to be a "proxy" for rows in the database. When you call "save", it's tracking the identity of the object using the primary key, and generating SQL from the runtime state of the object. This is backwards.
The biggest problem for me is this "proxy" behaviour. When you start treating objects as proxies for database rows, your database logic inevitably leaks into your business logic. Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.
On top of that you suffer terrible performance problems. Even if you're smart enough to avoid the whole N+1 query issue, you lose control (and more importantly visibility) over where database updates are coming from, and your ORM is either not smart enough to efficiently apply updates, or so smart that building the query takes longer than actually running it! (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.
And it all serves no purpose! The application-specific interface to its database is usually quite simple. You can just have it be a literal interface/trait/whatever and implement each method by executing a bit of SQL, or calling a stored procedure, or however you like. Arguments and return values should be plain-old-data (no magic proxy objects) and each method should correspond roughly to a transaction so that the rest of the application doesn't have to worry about that database-specific stuff.
There are a ton of other benefits to doing stuff this way, too many to list here, but it really helps with complex migrations (like if you need to migrate from one database to another), maintainability, testing, etc.
> (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.
for the write? writes are not usually much of a performance issue except when people are inserting thousands of rows. You can replace flush with direct INSERT/UPDATE or use the bulk API. Pre-setting primary key values will also improve performance by an order of magnitude. Share some of your code and profile results on the mailing list and we can look into speeding it up.
SQLAlchemy always provides many techniques to optimize areas that have performance problems, not to mention the primary direction for most major releases is that of new performance features and improvements year after year, despite having to deal with the dog-slow cPython interpreter. No "total rewrite" of your application should be needed.
The SQLAlchemy ORM is a very complex beast. It provides plenty of methods to optimise the queries it generates. It provides no way to "magically speed up the python code".
This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.
The only way to get the performance improvements we needed was to stop using it as an ORM altogether. However, the ORM was too intricately tied with our business logic to do that without a near total rewrite.
What we've actually ended up doing to do this rewrite more incrementally is to create a Rust service which implements just the "GET" end-points: this gives us an order of magnitude performance improvement despite it using the exact same database and fetching the exact same data. Most of the business logic did not need to be duplicated as it is not used from "GET" requests, but users of our application will get much snappier loading times, and it will generally feel much better.
> It provides no way to "magically speed up the python code".
it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.
> This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.
that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.
Here's a runsnakerun I make some years ago of the PyMYSQL driver loading data over a network-connected database: https://techspot.zzzeek.org/files/2015/pymysql_runsnake_netw... Notice how just the Python driver spends TWO THIRDS of the time an the actual waiting for the database 1/3rd ? that's Python. If the database is on localhost, that pink box shrinks to nothing and all of the time is spent in Python, just reading strings and building tuples. It's a very slow language.
Rust language however is blazingly fast and is one of the fastest languages you can use short of straight C code. This is not the fault of an ORM, this is just the reality of Python. You certainly wouldn't think that if you used an ORM that runs under Rust, it would be as slow as your Python application again. It would continue to be extremely fast. The Hibernate ORM is an enormous beast but runs immensely faster than SQLAlchemy because it's running on the JVM. These are platform comparisons. SQLAlchemy is really fast for the level of automation it provides under pure Python. You made the right choice rewriting in a fast compiled language for your performance critical features but that has little to do with whether or not you use an ORM. As long as your database code was in your Python application, you'd end up writing an ORM of your own in any case, it would just be vastly more code to maintain.
> it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.
Most of those techniques boil down to "use SQLAlchemy as a query builder rather than an ORM" - something I wholeheartedly agree with and would love to do, but it is next to impossible because these ORM objects have leaked into the business logic of the application.
> that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.
I'm not trying to say SQLAlchemy is a bad implementation - it's a great implementation with tons of powerful features. I just believe that using anything as an ORM is a bad idea and directly leads to these performance and maintainability issues. Whether that's ActiveRecord, SQLAlchemy or even if someone implemented a similar ORM in Rust.
Some of the performance issues are due to python, but that's not the whole story: if I use SQLAlchemy as a query builder, and only return plain-old-data, I do not see nearly the same performance issues. Not because SQLAlchemy ORM is badly written, just because it has to do more work, create more objects, with more "magic properties", maintain more entries in the session, etc. etc.
While you're not technically wrong at the SQL level, I believe there's a higher level of business problem that ORMs like ActiveRecord solve for that's being overlooked. (aka ActiveRecord got Airbnb and Shopify a long way in scaling both developers and product).
CPU time building a query is rarely the business bottleneck until you're at a huge scale - scaling up the engineering team uniformly and having business logic at abstraction level closer to the rapidly evolving Product specs is the bottleneck that ActiveRecord solves for, and does pretty well.
If you'd like something that avoids object/hidden state - check out how Elixir/Phoenix's Ecto[1] was designed - it avoids many of the shortcomings of ActiveRecord and SLQAlchemy:
If "SQL is the source of truth" means you have no in-memory state, I'm not sure that's feasible, with or without an ORM.
Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later. That you will probably re-use for more than one access (if you need it more than once), because issuing the same SQL again the second time you need to reference any data returned, in a given routine, would be insane.
I think what they’re proposing is something like “if I want to read and then update a value, incrementing it by one, the proper way to do that is to tell the DB to increment whatever value it has stored; not to increment my local representation of the value and then tell the DB to overwrite the value with it.”
The number of times I've had to increment a value by one are minimal. Or similar things where you want to update one value based on the other values, and do it in an atomic way.
But if I had to increment specifically, and were using ActiveRecord... I'd use the ActiveRecord increment! method. Which has the proper db-atomicity semantics you want, it does execute `UPDATE x SET n = n+1 WHERE pk = y ...`
This is not obvious unless you think about it - I've seen issues like this because the code looks correct if you ignore the update semantics, especially when it wont fail on your local dev env because in memory state will be the same as DB state since you're the only one hitting it...
Having to think about these things because the abstraction is leaky instead of just using the SQL where the transformation is explicit is not worth the hassle.
Using Clojure and yesql was a real pleasure for this - Clojure is immutable so you're working with values all over the place - and working with SQL query is just passing values in it and getting values out - no magic mapping - it's just values - if you want the latest value - query again, if you want to store a new value - send the new value. No mutation, no magic, just data.
> Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later.
Nope. The DBMS is making the same transaction isolation guarantees to an ORM as to any other client.
I think the larger "source of truth" issue is how the schema is represented.
All of these systems that provide their set of schema objects that are then pushed to the database. This works fine for LAMP stacks, but as soon as you grow it becomes apparent that you've put the horse before the cart.
>Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.
Indeed. Its a rookie mistake but hey, you work with rookies all the time! Orms can be useful but they certainly allow for some dangerous patterns that are hard to notice if you're a young developer.
They mention the case where the ORM shadows the db schema. The argument isn't terribly salient, it's not like you can't build type checking in, it just means more work if that's what you want to do. Essentially according to the author the only way to do it right if you have an existing db is to write all queries from scratch.
My preferred solution for existing DBs is Ruby's Sequel or ROM if you want to go nuts. But I'm a diehard Rails lover and will choose ActiveRecord over all other solutions if I can. To me there's only one decent ORM and everything else is either good for special purposes or junk.
So, I've seen this go back and forth so many times, and I've come to the conclusion that the two extremes are irreconcilable. Not because they're incorrigible, but because they represent two very different ways of using a database.
For some, the database is not actually a database, per se, it's just a place to persist state. Usually folks in this camp think in an object-oriented way. They haven't read Codd's paper, they don't care about the relational algebra, they just want some way to take bags of properties and stick them somewhere that offers random access with good performance and is reasonably unlikely to not spontaneously erase data. If you're in this camp, ORM works great, because there isn't really much of an object/relational impedance situation in the first place. The data store wasn't structured in a way that creates one, there's no need burn a bunch of effort on doing so, and dragging in the whole relational way of doing things is like dragging a wooden rowboat on a hike through the forest because you'd like to use it to go fishing on the off chance you find a nice pond.
Others see a lot of value in the relational model. They're willing to put a bunch of time and effort into structuring the data and organizing the indexes in ways that will allow you to answer complex, possibly unanticipated questions quickly and efficiently. They, too, hate the syntax for recursive common table expressions, but are willing to hold their nose and use them anyway, for various reasons, but mostly because people think you're really cool when you can spend 30 minutes and make something go 2-3 orders of mangnitude faster than it used to. They don't think of the data in terms of bags of properties, they think of it in terms of tuples and relations and a calculus for rearranging them in interesting and useful ways. For them, there is potentially a huge problem with object/relational impedance; the data's organized in ways that just don't fit cleanly into Beans.
The thing is, neither of these ways of using a database is wrong. Each has it strengths and weaknesses. The trick is figuring out which way fits your business needs. Well, that's the easy trick. The hard trick comes when someone on your team doesn't understand this, believes there is one universal solution that will work for everyone, and is hellbent on jamming the One Righteous and Holy Peg, which happens to be square, into a round hole.
Django-rest-swagger to have a Swagger / OpenAPI spec on top of Django REST Framework, and then OpenAPITools/openapi-generator to generate Typescript code.
If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?
Do they still obtain some benefit from the schema, since from time to time the semantics of a property will change, and a schema can tell them what the current shape of the data is and guide the migration. Or would they prefer to just write a new property which does a lazy conversion from the old terms? (To an extent, I suppose the answer to this question determines the answer to the first. But I guess there's other tradeoffs to Monggo I'm not aware of, since schemalessness fills me with fear and I just don't want to look there.)
Regarding your final paragraph: I suspect that, for many apps, the choice between relational vs transparent persistency is largely determined by the team who is working on it. The "hard trick" is therefore trying to balance a strong personality with a strong view who disagrees with the rest of the team who have weaker personalities and weaker views, but who all agree on the other side of the fence. This is simply a standard management question with very little technical relevance.
I suppose it depends on whether you want schema-on-read or schema-on-write.
Even if you're working under the first model, there's still a lot an RDBMS can do to help you ensure data integrity. Largely by being less flexible. Databases like MongoDB allow for a more fexible schema, at the cost of pushing a lot of the work of ensuring data integrity into the application code.
For my part, I do a fair bit of working with databases that were built on the MongoDB of the '90s, Lotus Notes, and I've seen what they can grow into over the course of 25 years. It's not pretty. That experience has left me thinking that, while there's certainly a lot of value in the document store model, I wouldn't jump to a document store just because I don't need everything an RDBMS does. I'd only do it if I actively needed a document store.
> If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?
Certainly relational like for 90% of the cases, if not all.
The relational model is THE answer to nosql from the start (ie: it was the solution of the originals "nosql").
Is totally more flexible, powerful, dynamic, expressive... And that without talking about ACID!
You can model all "nosql" stores with tables. With limited exceptions it will work very fine for most uses...
> This is simply a standard management question with very little technical relevance.
I don't get what your are implicating here...
But nosql solutions are the ones to be suspected and the ones to requiere a harder qualifications and justifications to use. Is the wrong choice in the hands of the naive. "NoSql" is for experts and for niche/specific workloads.
The advantage of using an ORM is that you can always not use it in the places where you are doing things that are not suited to the strength of the ORM.
I tend to hand-write almost all of my migrations and many of my queries that synthesis data from multiple tables to reach a conclusion. I can think of only a handful of times where it was worth writing custom code to persist state (usually only when there are a large number of records that need a couple of specific fields updated.)
Like many tools, it all depends on how well it is used and how well it fits its use-case.
> The trick is figuring out which way fits your business needs.
Rule of thumb - if you don't control the database, use an ORM; if you do control the database, work directly with it.
For example, let's say that your business is a software company that sells an on-prem product. Some of your customers have Postgres expertise, some have MySQL expertise, some MSSQL, some people are stuck on Oracle. Forcing customers to develop DBA expertise in a database they're not familiar with just for the privilege of buying your product is a sales disaster in the making. So you go with an ORM and set up QA testing that tests releases across all of the databases that you support, and the ORM helps you by making it much more likely that your development efforts will automagically succeed in working with each of the supported databases.
In most other situations, though, it makes much more sense to start with the data design. If your business grows, your databases are going to grow. You are almost guaranteed not to switch databases (absent overwhelming financial need, see: Oracle) over the lifetime of your company. Data analysts (data scientists now?) can extract serious value from your databases by getting into the weeds of the database schema, indexes, and queries and working with developers and DBAs to optimize them for business reporting. If you give up control to an automated tool that knows nothing about your business, your business will be less competitive as a result.
Data is far too valuable these days to refuse to develop expertise with the underlying databases.
Why do your customers even need to have expertise in the DBA you're using? We just sell them a black box (usually VM images), with a few endpoints to extract data in standard formats. They can use whatever they want to connect to those.
Maybe one customer wishes to run their databases in a cluster distributed across two continents.
Maybe another customer has bought Oracle and the installation still has room. Also, they have a custom backup scheme that takes their load patterns into account.
Customers wish many things, that doesn't mean they're relevant selling points. It just sounds bad judgment to me to tie yourself that way, unable to take advantage of the RDBMS to the full. And have you even tested running your (hypothetical) application in a distributed Oracle cluster across two continents? If not, how will you support it?
The most basic question I'd ask: is the core of your application the data or is it the business logic? Or to put it another way: does it make more sense to build the application around data, or to build your data around the application?
Another question I'd ask is whether you're expecting to deal with millions of rows/objects or hundreds of millions. Modelling relational data correctly can have performance impacts in orders of magnitude.
When I look at most projects, I instinctively begin by modelling the data structure; then I think about why/when/how data can move from one state to another; then I think about how an application could prod the data between these states; then I build code which runs against the data.
If your application isn't data at its core (e.g. a document-based app) then it probably makes more sense to treat data elements as objects and use a CRM (or similar) to store and retrieve the objects.
An ORM does not usually limit how much you can model your data and create fast queries. The modeling you talk about can be done just as well in e.g. the Django ORM.
ORMs that I've experimented with tend to fall into one of two categories: either they treat the object model as prime, or they treat the relational model as prime.
The former almost invariably spurt out inefficient queries, or too many queries, or both. They usually require you to let the ORM generate tables. If you just want to have your object oriented design persist in a database, that's great.
The latter almost invariably results in trying to reinvent the SQL syntax in a quasi-language-native, quasi-database-agnostic way. They almost never manage to replicate more than a quarter of the power of real SQL, and in order to do anything non-trivial (or performant at scale) they force you to become an expert SQL and/or how it translates its own syntax into SQL.
And once you become more expert at SQL than your ORM, it's not long before you find the ORM is a net loss to productivity—in particular by how it encourages you to write too much data manipulation logic in code rather than directly in the database.
For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it has always turned out to be a mistake; de-objectifying it consistently results in simpler, shorter code with fewer data bugs.
I tend to find that the longer data spends being sieved through layers and tossed around inside your application, the more data bugs you'll end up having. It's much better to throw all data at the database as quickly as possible and do all manipulation within the database (where possible) or keep the turnaround into application code as short as possible. It means treating read-only outputs/reports more like isolated mini-applications; the false nirvana of code reuse be damned.
And that doesn't mean replacing an OOP or ORM fetish into a stored procedure/trigger fetish. It means realising that if your application is data at its core, it's your responsibility as a programmer to become an expert at native SQL.
The problem is that far too few programmers realise how deeply complex SQL can be; it's treated like a little side-hustle like regular expressions, when for so many programmers it's the most valuable skill to level up.
You'll have to pry ActiveRecord from my cold, dead hands. I'll take it any day over the half baked quasi-ORM that your company has inevitably made over the years because an early engineer was anti-ORM and it turns out doing raw SQL all the time is tedious.
What’s nice about ActiveRecord is that while it obviously does hook into the Rails object-oriented style, it’s actually more relational and functional than oo in how it works underneath. It’s a fairly thin layer of abstraction over SQL. I think of it more as a way to build and compose SQL statements than a true ORM—at least when it’s used well.
At my last position we simply did sprocs and dapper. When you take as true that the database is the source of truth, it only makes sense to put the queries in the database. The only exception is dynamic queries or modifications.
How do you avoid leaking business logic into database stored procedures? Sounds like you would now have two sources of truth, application logic, and database stored procedures.
I'd like to see someone have a decent go at reexamining the idea that business logic (BL) doesn't belong in the DB and tease apart exactly what that should mean, if it's actually still true, or even if it's ever actually been true. Maybe something like that's been posted here before, but I haven't seen it...
I bought into the idea myself for a while, but when I interrogate my belief there it was just something I picked up at uni as part of the general 3-tier approach, and I'm not sure how much of it is really practically grounded. I see the same ideas held dogmatically by newer employees, and when I ask them about it in detail, I see the same fuzziness.
After all, if you want to really be pedantic, then you could claim anything beyond having a single table with two columns, "key" and "value" is pushing BL into the DB.
I.e, what practically does the separation gain you? You may want to swap out the DB in future? Almost never happens, not without some significant other refactoring going on (e.g breaking out into separate DBs because the product's grown to the point of needing distributing across services, etc). If you want to really design with that in mind, you're almost certainly giving up a lot of functionality that the particular DB is going to give you. It's on the level of holding onto the possibility that you'll want to change the language you're writing in at some stage (and that event would favour pushing more into the DB anyway).
In reality, what I've found is when there's a reliable, transactional, relational datastore at the bottom (i.e we're not talking about a larger distributed system), then for your own sanity you want to be pushing as much as possible down the stack as possible, to reduce the surface area that someone could reach through to change data in the wrong way. Data consistency issues are some of the worst when it comes to eroding customer trust, and if your data access/mutation paths are many and varied then you can do your head in or burn through a lot of dev morale trying to diagnose them.
The strongest advocates otherwise I've found are those in office cultures where there's little trust between devs and DBAs and getting things through DB review are a rigamorale that end up driving devs towards doing as much in code as possible. I've always suspected that beyond Google envy, these sort of dynamics are what drove a lot of the initial NoSQL movement...
Thanks, the Derek Sivers piece rings a bell now (and maybe was going into my thinking on this), and the other two will go into my notes to avoid having to repeat myself the next time this one inevitably comes up on the office Slack...
I worked on one system that did intentionally put business logic almost entirely in the DB. The VP was a former database guy and the primary justification was auditing. It worked well enough in practice.
There’s a quote from Gavin King somewhere exhorting developers that ORMs and raw sql are meant to be used concurrently, ORM for CRUD ops,sql for most everything else.
The problem I have with ORMs is lack of paging on collections; there’s always that one user that pulls in half the database.
In my experience, keeping BL out of the DB is a practical concern, not a dogmatic one.
Usually there are two things:
1. Performance. Business logic is usually horizontally scalable, databases are usually not. You want to pull as much BL out of the database as possible, and put it into stateless microservices so they can be scaled.
2. Version control and deploying changes to business logic. Business logic changes frequently. Do you want to have to be making that frequent of changes to your database? Do you have practices around safe deployment and code review for stored procedures?
I've worked at a place where there was lots of business logic (millions of LOC) stored "in the database". In this case the database was MUMPS. It can be done, and it can be pleasant. The catch is that vertical scaling is the only option, and you have to spend years building your own tooling from scratch.
I think the line between database and code is destined to become even more blurred, but not by bringing the code down into the database, but by lifting more and more database concepts up into "application space" as distributed computing becomes more normalized.
Yeah the performance point is interesting, and I think it has merit in a well-defined scope. The idea of bringing DB concepts up is something I've seen done, but it wasn't a pleasant experience overall and I think everyone involved is regretful that we went down that path. It's a good story though. Ramble starts:
The system was an enterprise-focused offering with weak product management, so in the early growth-oriented days of the company ended up saying "yes" to a lot of feature requests that we probably shouldn't have. Where this particularly impacted was the permissions system, which evolved over time from an ACL with odd warts to also include a bolted on hierarchy of users and their objects, and then a role based system, and then all sorts of enterprise-level oddities like cross-links in the hierarchy, shared subhierarchies, roles that could appear at multiple hierarchy points, virtual ACLs that behaved slightly differently etc. So potentially a large number of different access paths between a given user and resource.
Years ago, when this mess was starting to expand it was decided it was too hard to model this in the DB (and really that should've been a giant red flag), so the new approach was to load each customer's dataset up into an application server that would handle the BL, crucially including the permission logic. It very much wasn't stateless, as you mention, but I'm not sure how it could've been really, given you needed so much loaded up from the DB in order to make these permission decisions. Would've avoided a lot of headache if it had...
The consequence though of using this write-through in-memory cache was it became the source of truth.
The chief problem this led to was that shift into application land was a bell that couldn't be unrung. Everything others in this thread have complained about seeing BL spread across all sorts of SPs in a DB happened here, just in application code (which again hints I guess that the chief problem is architectural and lack of governance, not a wrong-layer problem). Nobody could properly describe the permissions system in response to support requests without a lot of code scouring, let alone hold it in their heads.
Even worse, as the application grew in size and needs, we found we still needed things we had left behind in the DB. A couple of smart devs working on the core service, because they were smart and trusted, convinced themselves that what we needed was an in-memory transaction system for managing this in-server cache (by now the core was being called the Object Cache, a name so generic that it also should've been a red flag). So a couple of years went into implementing a technically impressive reimplementation of a transaction system.
Meanwhile the system as a whole was well past the point of being needed to split up into multiple services, so a grand goal was set of moving the Object Cache into a standalone service: the Object Service. Slap an OData API on it, and then leave that API open for all internal teams to hit. By this point the core team who owned this was starting to become well aware they had fallen into a bad pit of reimplementing the Postgres DB everything still sat on: transactions, generic query API, configurable in-memory indexes for each type of object, partition loading logic for the timeseries sets, user permissions etc. Worse, the generic query API (and this is what's ultimately turned me off OData/GraphQL etc for cross-team interfaces) ran into all the same problems as an SQL interface - people in other teams would always be coming up with brand new queries your indexes hadn't anticipated, forcing new work on the keepers of the cache to support.
The way forward probably would've been to leave the permissions structure in place and pull as much as possible out of the service/cache into separate stanadlone services, i.e leave the objects in the object cache little more than just IDs you could look up elsewhere for actual attributes. We'll never really know though: it was recently decided to put the whole thing into maintenance mode, partially because nobody has the political will to fix the thing and its complexity.
I've thought a lot about the lessons of this system and where it went wrong, and I trace it ultimately to forging ahead with a permissions system (i.e core BL) that couldn't be modelled in the DB. I think that doomed us to what I'd ultimately name second system syndrome (i.e the whole stack above slowly evolved into a poor Postgres clone). Perhaps if we had been ruthless in going to a CQRS design or similar very early on we could've pulled it off (which would again demand stateless services above) we could've pulled it off.
I think a big takeway for me was to not take for granted the good things a RDBMS is giving you, especially transactionality. I think I'd bias these days towards looking for ways to scale up the powers the DB gives you (even if it involves pushing external integrations into the DB maybe via some funny FDW or transaction log integrations, or maybe using a tiered solution of DBs, one layer as the storage layer and another layer above that acting as coherent transctors, a little bit similar to Datomic's internal structure), rather than rushing toward replicating core DB functionality in application code, which can be tempting initially as the easier-looking solutions.
I think I might blog about this when the system's properly retired and the involved can look back and laugh rather than cringe...
As a rule, relational database systems have two strong areas: parallel processing and data consistency enforcement. I've always understood "don't do business logic in the database" as meaning not to use the database engine for sequential processing.
If you have a single-user database system (e.g. webapp with database backend), there is relatively little to gain implementing data validity checking in the database, other than the declarative statements versus imperative rules discussion (which can already be a huge benefit, depending on the team).
But once you have a central database with multiple frontends (common in enterprise ERP solutions), enforcing data consistency in the backend becomes pretty much unavoidable -- otherwise a single bug or new feature in one frontend could disable entire production lines.
when there's a reliable, transactional, relational datastore at the bottom [..] you want to be pushing as much as possible down the stack, to reduce the surface area that someone could [..] change data in the wrong way
I'm working on a project that is almost literally your example of reducing a relational database to a key value store, despite client applications having complex problem domains to model.
The consequence has been catastrophic data consistency problems and the whole programme rapidly grinding to a halt.
I think the problem is the overly broad definition of 'business logic' that encompasses everything from data integrity to presentation logic.
I've seen "business logic" refer to both A) how a discount promotion works and B) how to arrange images of the product for a particular set of customers. Obviously it's fine to have A in the database...
thanks for bringing that up. everyone on a team needs to have the same working definitions, and yeah... "business logic" is a term that's used a lot, but I very often see people not really understanding that it's usually meant in opposition to "display logic", not just as an interchangeable term for "any code".
Business logic being in stored procedures is the ideal place for it.
The problem with it is technical: the tooling for migrating and managing schemas and stored procedures is hot garbage, and there aren't good ways to enforce consistency between the invariants of the applications and the database.
Really, it should be possible, on both application deploy and when stored procedures are updated, to import library functions that the stored procedures use from the database and run unit tests against them to ensure that all clients are compatible.
Have done stuff like this where the queries were quite complex and essentially BI style.. Wrote a system to store and manage the procs in VCS and manage them through the ORM migration system(ORMs are quite a la cart).
Your either generating and submitting the queries through prepared statements or just storing them in the database. It's not "leaking" IMHO to do the later and in fact can come with benefits; you can call the procs while connected to the database through various other clients be it command line or a BI system. So in effect you are encapsulating the logic in the place with the most potential for re-use.
With the MS stack it's really easy. You can design the schema and procs, views, etc. in Visual Studio, then have it compare against each environment to create the deployment script.
I like ActiveRecord. I like it so much I've implemented the Active Record pattern myself and compared my results with ActiveRecord.
That said, I've come around to the Repository pattern and quite like the Ecto way of doing things. It's a lot more flexible at the expense of some simplicity. It makes some really hard things easier than AR can.
This but Hibernate + Spring Repositories. Writing SQL is fine but the 95% CRUD cases are much better served with a decently managed ORM. We have an older project here using iBATIS, MSSQL and lots of stored procedures/SQL-first approach. See this tutorial for a preview of the horror to come: https://www.tutorialspoint.com/ibatis/ibatis_create_operatio...
'Employee.xml'
Now imagine this but with stored procedures and inserts with tens of parameters.
I've worked on a system where almost all of the business logic was handled as stored procedures (apart from things that belonged at the UI level, of course). Worked fine, was easy to debug , and performed great. Sure, you had to know how to actually write good SQL, which threw off kids who think programming equals using JavaScript framework du jour, but I don't see it as a shortcoming. No XML anywhere either.
At some point they rewrote it as a web app with NHibernate and all that. Took them literally many hundreds of man-years, and it still runs like molasses. And profiling whatever crap NHibernate emits is rather joyless enterprise.
As a fellow AR lover the only things that consistently bug me are when it does things like:
- .count on a collection calls SQL every time, but there is also a .count method for arrays - which means you need to remember to use .length if you just want to count the objects loaded into memory
- ‘build’ methods for an association usually only initialize a record in memory, but with has_one associations it deletes any record that already exists
So basically when it’s not clear about the fact that it’s doing database things.
There's a .size that works for pretty much everything: if the relation isn't loaded then it will do select count(*), but for arrays and loaded associations it does length of an array
So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?
Then you call your alternative anti-orm and it generates... orm specs, and then you say "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?
Any SQL wrapper worth its salt provides some mechanism to specify positional (or even named) parameters within that string (without resorting to string interpolation or concatenation) and pass them in when executing the query. Usually it'll be something like
foos = execute_sql("SELECT * FROM foo WHERE bar = ? AND baz = ?", [bar, baz])
If you're resorting to concatenating strings, then you're almost certainly opening yourself up to SQL injections (and rather needlessly, I might add!).
Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)
Also, getting the library to properly expose the formatting interface is also a trick. We would do massive loads/dumps that we did not want to use Python for (but we worked in Python) as the DB-API pulls the entire result set into RAM, and we were pulling >RAM (and we didn't want to do something lower-level w/ the driver). We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. And the Python library really doesn't (last I checked) have a public interface to the templater portion of the library. (I think this is b/c PG just sends the parameters, and the binding is done server-side. So, really, psql needs an interface to supply those values as arguments to the CLI.)
> We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. [...] So, really, psql needs an interface to supply those values as arguments to the CLI.
I haven't tried this before, so caveat emptor, but per psql's manpage this should be supported:
QUERY="SELECT * FROM sometable WHERE name = :'foo'"
echo $QUERY | psql -v foo=asdf # ... other args ...
Basically: you can use colon-prefixed variables in the query text (with the variable name optionally quoted to indicate how the result should be quoted, apparently?), and set them with the -v option (which is equivalent to using the \set command within the query string itself).
This also happens to work for table names, at least per the example in the manpage:
testdb=> \set foo 'my_table'
testdb=> SELECT * FROM :"foo";
Except in this case it'd be more
echo 'SELECT * FROM :"foo";' | psql -v foo=my_table # ... other args ...
Still no option there for positional parameters, but it's a start, right?
> Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)
Thin SQL wrappers/dsl's like the SQLAlchemy expression library are great. IMO those thin wrappers are what most people should reach for first, over a full blown ORM. A good mimimal sql wrapper will:
* Save people from pain, problems, and security issues involved in building queries through string manipulation.
* Map very closely to raw SQL
* Make it easy to compose query expressions and queries, using language native features.
* Help devs develop their SQL skills. Learning the wrapper IS learning SQL, for the most part.
I have a hard time seeing many good reasons to add any more layers of abstraction on top. That last point is particularly important to me. My first exposure to many advanced SQL techniques, was through such libraries. Since the code maps to sql quite naturally, that learning can be applied in a much wider variety of contexts. Teach someone how to do aggregations using Django's ORM, and you've taught them to do aggregations in Django's ORM, and basically nowhere else.
I think the issue the parent is referring to is when you have optional clauses in your statement. Like if you have a table with 5 update-able columns (where someone might want to update any 1 to 5 columns that you don't know in advance), and you want some generic code that updates a row, if you're writing raw SQL then you have to do concatenations to build up the "set clause":
UPDATE sometable SET {set clause} WHERE id = ?
The "set clause" might just be "foo = ?", but it might be "foo = ?, bar = ?", or "foo = ?, baz = ?", etc.
The answer to that is almost certainly "it depends on your DB". Most query planners should be smart enough to not try to write anything, but some are smarter than others. For example, SQL Server (if I'm understanding right) does perform a log write even if it should be obvious that the value won't change: https://www.sql.kiwi/2010/08/the-impact-of-non-updating-upda...
That being said, I'd skeptical of the performance impact (if any) being all that significant; even in a worst-case scenario of "yeah, it's going to stupidly write every field every time", the columns for each row should be pretty close together both in-memory and on-disk, so unless you're writing a whole bunch of giant strings or something at once (and even then) there shouldn't be a whole lot of seeking happening.
tl;dr: assuming there's a minor performance impact is reasonable, but it should probably be measured before trying to optimize it away.
So you mean I'm writing code like this, in a language somewhat like Typescript but simplified to my end.
function update(id: ID_TYPE, value: Partial<sometable>) {
const set_clause = value.keys.map(k => k + " = ?").join(", ")
const params = value.values.concat([id])
db.query("UPDATE sometable SET " + set_clause + " WHERE = ?", params)
}
How is code not terrible? It shouldn't pass code review. Once you concatenate strings to generate your sql, all bets are off. There will be a day when your object doesn't match your assumptions. Either you have a series of different functions, each of which will change under different circumstances; or you should just be generating this function like this directly from the schema in which case you might as well just generate code you want at build time so there's string concatenations anywhere.
Is there some other circumstance when you want that?
So I want to respond to this because there is a genuine question behind your rhetoric. And that question is "How do I cleanly do complex things in SQL without an ORM?"
And there is an answer to that. The idealistic answer is good DB design from the start. The realistic answer is by using views (sometimes materialized views) to create a clean representation out of odd tables.
The monster query can be decomposed into simple and clean operations against these views (and if necessary, views upon views).
The only caveat is you need a SQL expert to understand the performance if you want to do this at scale.
I wrote SQLAlchemy after a five year job where all we did was work with enormous views in an Oracle database. They performed nightmarishly bad. The slightest attempt to query from one of these views using something as heretical as a join would kill the database. The source code for the views was locked up tight by DBAs who had no idea how our application worked, didn't care, and they could never be bothered to help us with their horrendously inefficient views or to give us new ones that our application sorely needed in order to get data from the database in an efficient manner.
SQLAlchemy exists exactly because of the pain the "just use views" approach causes. The entire point is that you can structure a query of arbitrary complexity in a succinct manner using Python structures, while not losing any of the relational capabilities. Of course you can write queries that are too complex, if you're then lucky enough to be able to have materialized views at your disposal, you can turn the SQL you wrote into one. But that's an optimization you can use if you need it, or not.
On the one hand the article itself is offering a anecdote about being unhappy with ORMs. On the other hand you have an experience in which you were unhappy with views.
That would leave us at an impasse (he said / she said), unless we get more nuanced. So let's.
It sounds like some of your frustration is at some DBAs you knew. Firstly I've never worked with a DBA, I'm just a senior software engineer / DevOps and to me that includes an intimate knowledge of SQL perfomance, so maybe this is why I don't mind an in-database approach.
I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).
Now maybe you agree with that, and see ORMs as a tool for the kind of job where the dev is locked out of the database. If so then I can't really disagree as I haven't worked at such places much at all.
> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).
if you have the option to use materialized views, that can be helpful when appropriate, however materialized views have their own issues, not the least of which is that they need to be refreshed in order to have current data, they can take up a vast amount of storage, not to mention a lot of databases don't support them at all. It should not be necessary to emit CREATE MATERIALIZED VIEW with super-user privileges and to build the infrastructure to store them and keep them up-to-date for every query one needs to write or modify that happens to have some degree of composed complexity.
you don't need an ORM for these queries either, just a composable query builder (like SQLAlchemy Core) that allows the composition task to be more easily organizable in application code.
none of this means you can't use views but it's unnecessary to dictate that they should be the only tool available for dealing with large composed queries.
> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).
SQL Alchemy will spit out SQL query. Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.
> Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.
No, on two levels. Firstly, there are many features in SQL that an ORM will not support (e.g. no substitute for a materialized view).
Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column). Part of being expert in DB technologies is knowing how/why some ways of writing a query are very fast, and others are very slow. The idea of an ORM is to hide that complexity, which is fine for a toy todo-mvc app. But once you start querying tables with 100k rows you need to understand that complexity and master it if you want to write a fast query.
> Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column).
SQLAlchemy supports case-insensitive searches, per-column / per-expression collation settings, index and other SQL hint formats for all databases that support them, e.g. SQLAlchemy's query language supports most optimizing SQL syntaxes, with the possible exception of very esoteric / outdated Oracle things like which table you list first in the FROM clause (use index hints instead). We are adding esoteric performance features all the time to support not just SQL-level tricks but driver level tricks too which are usually much more consequential, such as the typing information applied to bound parameters matching up for indexes as well as special driver-level APIs to improve the speed of bulk operations.
SQLAlchemy has been around for thirteen years, plenty of SQL experts have come along and requested these features and we implement them all. Feel free to come up with examples of SQL-expert level performance optimizations that SQLAlchemy doesn't support and we'll look into them.
Yeah, don't get me wrong, I haven't used SQL alchemy and I'm not trying to besmirch it. I'm not even sure we disagree.
What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work. My point about views is that in my experience, the answer to disgusting queries is cleaning up the DB design (and views can be the tool to accomplish this). My point about case-insensitive searching (you can create a case-insensitive index if you want) is that a lot of db-performance stuff just can't be solved on the query side alone anyways.
It sounds like SqlAlchemy is designed with a lot of flexibility around how queries are run, so maybe you agree that understanding what's going on beneath the hood is important to handle these complicated cases.
> What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work.
Thank you for this response and I agree, we are likely on the same page. I don't know that anyone actually espouses that philosophy. This is the anxiety that ORM skeptics have, and certainly you get beginners who rush into using ORMs not knowing what they are doing, but if someone wants to be a DB expert, I'm pretty sure they go to read about databases :) These ORM beginners will fail either with the ORM or without out. I guess the idea is you'd prefer they "fail fast", e.g. the ORM covers for them while they proceed to screw up more deeply? This is arguable; if you've seen much of the non-ORM raw DB code I've seen, it too fails pretty hard. But even with this argument, if ORMs produce the problem of incompetents who are not found out fast enough, why hoist the denial of useful tools to those developers who do know what they're doing.
You have shifted the discussion from ability of DB engine to optimize queries to the ability of a developer to write fast queries.
But in any case, we are not discussing some abstract ORM whose "idea is to hide complexity", but SQL Alchemy whose idea (one of them, at least) is to allow you to write composable queries yet still retain full power of SQL.
You are correct about needing to understand the underlying dB to be able to optimise it. I consider myself lucky to have come from a raw sql background and to have a deep understanding about how the query planners work.
You are incorrect in your other assertion. Sqlalchemy allows for optimising your queries at runtime in a way that just isn’t available to sql views. The beauty of sqlalchemy is that allows for arbitrary composition of blocks of sql. I too have had to fight with dbas in a previous life because they didn’t want me to construct a query in code (non sqla), but I eventually won because I was magnitudes faster because I had runtime knowledge they couldn’t use.
Many of us also know stories about how ORMs can be slow if developers don't pay sufficient attention. In some cases told by database experts who see the query code locked up tight by the app team, etc :)
Having a separate "DBA" from the developers or having an adverserial relationship with said person is not really a technical problem I think.
I've had similar issues, but I got the added benefit of being "DBA-splained" on why what we were doing was terrible.
I feel like if a DBA keeps thinking of the database as the solution to all of the problems (by implementing views, triggers, stored procedures, etc) then maybe that person should be willing to support them, otherwise they really shouldn't complain that the developers moved to an ORM.
Okay, obviously I've never worked in an enterprise enough company. I've always assumed a DBA was just a member of a backend team who likes thinking about SQLy stuff. Or a sysadmin who upgrades the computer the db runs on. Basically, a hat someone wears.
Do I seriously make the correct assumption that the people who are writing the backend of many apps are actually separated from the people who decide how the data those apps process the data, and you may have no direct say about the database schema? that effectively just a client - perhaps the only client, but nothing more than a client.
Obviously you've described the disadvantage of this approach; what advantage does it have?
It makes some sense when the data is extremely valuable, like in a bank. As a developer you can only access the db through stored procedures created by the database team. Which limits the damage you can do. I haven't worked in a big enterprise in years though so I don't know how common this still is.
> The idealistic answer is good DB design from the start.
The problem with this statement is that DB schemas like code can evolve badly over time, even if they're maintained by a DBA sometimes (since they're human afterall too).
And then the DBA starts to suggest that you use things like triggers in your DB, which may or may not be good, but often surprise developers. "I just wrote 0 into the DB, why did it turn into NULL? What line of code did that? Ohhhh... there's a trigger."
I tried to think about it from a gitflow release cycle perspective to be generous. This would mean constant churn of migration code for each branch of the repo.
If you have complex migrations which go beyond simply renaming columns, it would require you to write special migration code on a release branch which would then need to be merged back. yuck!
In what way is manually tracking the needed DB changes per feature branch even possibly better? Even if you are not running your migrations automatically on deploy, I would still check those migrations into version control.
Having auto-runnable migrations checked in alongside the code that depends on them allows you to review and test those migrations as part of your normal code review and CI processes.
I think there must be some misunderstanding here because I don't understand why you say either of these things:
> This would mean constant churn of migration code for each branch of the repo.
> it would require you to write special migration code on a release branch
There should only a be at most a few migrations per feature branch and merging is only an issue when you have two feature branches with inconsistent schema definitions. You should only have to add custom migrations to a release branch to fix bugs and those should be merged back into your develop branch as soon as possible to minimize effort spend merging later.
No, I agree with the post I replied to in that migrations should be human versioned as opposed to auto-versioned via a VCS. Again, I was being generous to the author trying to imagine a scenario where this would not create a dumpster fire.
This isn't quite the same as being solely git-based, but there are numerous examples in the industry of successful use of auto-versioned declarative schema management. By this I mean, you have a repo of CREATE statements, and a schema change request = making a new commit that adds, modifies, or removes those CREATE statements. No one actually writes a "migration" in this approach.
Facebook, who operate one of the largest relational database fleets in the world, have used declarative schema management company-wide for nearly a decade. If engineered well, you end up with a system that allows developers to change schemas easily without needing any DBA intervention.
This is interesting but I am curious about some of the implications. I am under the impression that migrations encompass more than schema changes. It may require data transformations, index rebuilds, or updates to related tables. I would agree that these things are preferably avoided, but in practice it seems that they happen frequently.
I viewed the author as targeting smaller dev shops who don't have a dedicated DBA/expert. Groups who might use an ORM's migration framework for example. In which case, it would appear that there is significant loss in migration flexibility if you remove application code from the architecture.
When operating at scale, it becomes necessary to have separate processes/systems for schema changes (DDL) vs row data migrations (DML) anyway.
For example, say you need to populate a new column based on data in other columns, across an entire table. If the table is a billion rows, you can't safely run this as a single DML query. It will take too long to run and be disruptive to applications, given the locking impact; it will cause excessive journaling due to the huge transaction; there can be MVCC old-row-version-history pileups as well.
Typically companies build custom systems that handle this by splitting up DML into many smaller queries. It's then no longer atomic, which means such a system needs knowledge about application-level consistency assumptions.
> "migration commands based on the git history"... what does that even mean?
It actually seems like a really cool idea. E.g.
* Commit A: create a schema.sql file "create table t (a INT);"
* Commit B: update the schema.sql to "create table t (a INT, b INT);"
* The tool can generate a migration from A..B "alter table t add column b INT;"
It seems like it probably would work really well for creating new tables and columns in simple projects. In more complex projects I've worked on, we usually had more complicated migrations to update the data to fit the new schema, and we often had to be careful to interleave schema, data and code updates (especially if you wanted to be able to rollback).
Automatic migrations using only source-code snapshots don’t work for things like column renames or splitting/combining columns. Every tool I’ve used requires some migration metadata.
> "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?
I'm not sure what git-specific interaction the author is envisioning, but more generally: declarative schema management tools are based on the notion of using a repo to store only CREATE statements. It's an infrastructure-as-code style approach, where to modify an existing table, you just modify the CREATE definition. The tool figures out how to translate this to an ALTER.
I'm the author of an existing tool in this space, Skeema [1], which supports MySQL and MariaDB. Other recent tools in this space include migra [2] and sqldef [3]. From what I understand, in the SQL Server world there are a number of others.
> So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?
Those aren't the only alternatives. Query builders exist.
A lot of languages have pleasant multi-line string literals. Also if you have a 1000 line hand-crafted SQL query, it's probably not going to be pleasant in your ORM either.
I know that the author of SQLAlchemy is reading these comments and I don't want to be mean, but anecdotally I hear a lot of pain stories and the OP seems to largely deserve the title "Don't Use SQLAlchemy".
Nobody suggests that ActiveRecord is perfect, but it is pretty amazing at what it does. I think a big part of the reason it's more reliably excellent is that Rails in general embraces the notion of "convention over configuration" - there's typically a correct naming scheme (with opinionated pluralization) that starts in the table scheme and flows straight through the models to the controllers and finally the view layer. I can join your Rails project and make guesses about where your logical components are located and what they are called that would beat any TV psychic cold 100:1.
Plus, much like with SQLAlchemy and basically every other modern ORM... you are always one method away from writing raw SQL if you want to be.
As always: use the tools that work best for you and for the vast majority of web development cases, you'd have to pry ActiveRecord out of my hands.
I use SQLAlchemy to serve tens of thousands of requests per second on one of my services at Reddit. It's painful, but possible.
SQLAlchemy uses the data mapper + unit of work patterns. This means that your connections and your record objects have the same lifecycle. If your request takes 200ms, then you are likely holding open your connections for 190ms, even though the only SQL query takes 20ms. If connections are scarce, and they usually are, you will not utilize them well at all.
There are two ways that I know of to solve this, neither of which is pretty: (1) run pgbouncer as a sidecar on your python processes. This has the effect of making connections a far cheaper resource. (2) After every single query, manually close the related transaction/connection/cursor in your code.
ActiveRecord defaults to giving up the connection after every query, which in a high-traffic webapp, is much less of a footgun.
(No significant experience with either SQLAlchemy or ActiveRecord, but work in Java at a shop that really strongly emphasizes the unit-of-work pattern.)
"give up connection after every query" is, IMO, not _less_ of a footgun, but a _more subtle_ footgun. That is to say, sure, for the most part, it will work ok.
_But when it doesn't_... like when your database goes down for a couple hours, or your rack loses network, or your proxy to whatever cloud provider is running your database, or maybe some other dependency had an outage between the queries... you end up not really knowing what executed and what didn't, with some fraction of your 10k req/sec leaving behind some dangling state -- a partially processed $whatsit -- and no clear/easy way to fix it or even necessarily identify how many $whatsits were affected. At least a few times, I've needed to either wait until records make their way into some OLAP system for offline analysis, or deploy some fixup code, or emulate either/both to fix stuff up after something weird happens.
At least that's what I am constantly afraid of. Do you also worry about this kind of stuff? If not, what prevents it from being a problem in your environment?
Signed,
The guy constantly badgering seemingly everyone about minding their transaction boundaries.
From my experience, there’s always one developer in every enterprise dev shop who understands transaction boundaries, isolation levels, and concurrency/reliability scenarios that do happen....and 10 other devs constantly instilling fear in him/her
But he's not saying "don't use transactions." He's saying "Don't keep your transaction open for your entire user connection for no reason. Close your completed transactions as soon as you can."
Fun story: A coworker and I are in the process of tracking down exactly violations of this in a misbehaving app, due to an accumulation of RPC within transactions across the entire app. (I used to be responsible for it as a greenhorn ~5years ago, but walked away for the last several years to work in a different part of the org.)
The approach is actually kinda interesting: we add an interceptor in the RPC layer for every outgoing call that asks the DB machinery “am I currently in a transaction?”, and emits a warning (containing at least the RPC being made, perhaps a (partial) stack trace) if so. We had so many occurrences of those warnings that our logging framework proactively started dropping log lines! So the next step is to only emit the warnings once per $duration per message.
Just to hopefully further contribute, one of the top-deck features of the new Rails 6.0 release is its really thoughtful engineering work around multiple database support. At least, that's how it's named or described; the actual net result is a lot of smart people exhausting themselves making n-instance DB stack reliability and performance better on every chart... including setups where the value of n is 1.
What this translates to is that things should be impressively stable when your DB nodes go down, the backhoe shows up to party, or your BGP is routed to North Korea.
You can divorce your objects from the session if you want. But you lose transactions then, so with any approach that involves giving up the connection it seems to me you’ll create a whole other class of headaches for yourself.
I’m lucky in that I have low requests per second so I never have to worry about any of that too much. I don’t envy you having to deal with that sort of load.
Entity Framework, even with its quirks, has been the best developer experience with an ORM for me no contest. A lot of that is probably tooling; like the standard C# tooling and also Linq Pad.
SQLAlchemy is probably close to the worst experience and that's likely largely due to python and pythons tooling state.. And the docs. Just something about the docs. Still, other than bulk insert performance I got it to do just about everything I needed and came to the conclusion it was the best python ORM for my needs.
For simple queries, sure SQL is great but it is not composable. Ideally you want to send your whole query to the database at once and for a complex statement that may mean many many lines of reasonably complex joins etc. With Sqlalchmey you can write a bunch of simple components and join them together for submission at once instead of having a thousand lines of SQL that can be hard to read. You can re-use parts and even come back later and understand clearly what is happening. How do I know this? 25 years of writing SQL and at leat 6 using SQLAlchemy. A lot of the time I will still compose and test in SQL using, say DataGrip or psql, then build an SQLAlchemy query from that. This guy should look at alembic too. It has it's faults (like the half module half executable runtime), but it solves most of the problems of just having a folder with a million naively numbered SQL scripts like managing up and down versions as well as branches.
I haven't met an ORM that wasn't more work to implement, especially when trying to keep schema changes in line with existing deployments. That said, I'll take a decent DB client SDK any day; I don't want to spend too many cycles on connection pooling and such. I just want it to handle variable substitution and concurrency primitives at the language level.
I don't think there's anything wrong with using tools to help you generate good SQL queries, but you should see them clearly and own what the database is doing. MySQL and Postgres have a myriad of useful functions and extensions that ORMs don't utilize or hide from the developer. That leaves the application to do multiple queries or a lot of logic that isn't needed.
It doesn't have to be gnarly; any SQL query that can't be handwritten on an index card is probably too cumbersome to maintain. A 1000 line SQL query is a code smell that means the data model could probably be improved, perhaps with views.
As someone who's used both SQL and Django ORM, I could ask: "Why SQL is so backward?"
Here's a datapoint I experienced:
The company I worked for didn't trust the Django ORM for DB migrations. They thought they could do it by hand. After lots of deployment failures where the migration script worked in the dev environment but failed in the prod environment, we switched to using django migrations. Haven't had an issue since.
The author even admits that migrations are a hard problem, but yet, he has no solutions other than just using raw SQL.
Edited to Add:
In Django, these migrations are handled pretty well with a single command to make the migrations. In SQL, someone would probably have to spend some time testing, and then would need to be reviewed in the PR.
In my experience Django's migrations are really solid, assuming you're using the PG or Sqlite backends. This is especially true if you're working from a new Django project, and not a "legacy system" that wasn't initially built for Django.
Every project I've seen that refused to use an ORM in favor of their own home grown solution was worse off for doing so. Almost every time someone claims that the ORM can't be used, I would point out how it could and that code became simpler and easier to maintain. If they read the documentation and used a bit of creativity instead of fighting against their tools, they wouldn't be left with such a mess.
This post reads to me as the author not having the proper ability to structure and plan their projects so they blame their tools instead of themselves. So instead of understanding limitations and tradeoffs, they try to build their own magic lamp that solves every problem under the sun.
My experience is actually the opposite - every project I worked on that relied on an ORM actually benefited from replacing it with a clear architecture pattern and a query builder. Having the concrete notion of what is being passed to the DB allows easier debugging, and it is way easier to solve performance bottlenecks. As an example, recently I replaced several hundred lines of complex ORM oriented code with a 10 line CTE that would perform the iteration and calculation being done on the client side - the result was a 10x increase in performance with a fraction of the memory.
By not using an ORM, the developer also stops treating the database as a "black box". It is a "black box" conceptually, but the more you understand how it works, the better your applications will perform. As an example - knowing if, inside a transaction, when you're performing a read it comes from the server you just started the transaction or some slave that may or may not be updated (due to propagation delay) may seem picky, but its one of those things that is very hard to debug afterwards as a transient error.
Yes and no, it seems its an external module. And this is part of the problem - why would I need an external dsl extension to translate a pure relational concept into SQL? Debugging the query will require sql knowledge, regardless of ORM.In fact, the ORM is another layer to be debugged, it just adds complexity.
I've had to put up several services throughout the years that performed exactly one or two queries, and a few more that performed less than five.
It was a matter of setting up the database, writing a prepared statement, and calling that prepared statement with the relevant arguments. Bringing in an ORM for these cases is absolutely overkill.
As your requirements constantly balloon, perhaps adding an ORM (or just a simple query builder) is appropriate. But none of the services I mentioned above have grown in that manner, and most are still up and have been running (and modified) for several years now.
I've been on projects where an ORM and/or query builder was likely the right answer, especially for automated migration and rollback scripts. But the way you're presenting it seems to be all or nothing, and I couldn't disagree more.
I find these ORM opinions generally coming from someone who is used to, and is proficient in SQL, and only when they have a complex use-case/operation at hand.
If you consider a web project, with many developers of different levels of experience, having an ORM is a life saver. Because not everyone is constantly mindful of SQL injections. Because not everyone is constantly mindful of transaction management in requests. Those take experience.
In my experience ORMs are great 95% of the time. Those 5% can be extremely painful (I've gotten bitten by ORM Rails transaction nesting problems, and sometimes it frustrates me when I need to make something more complex). But let's not throw away the baby with the bath water.
I have had great experiences with Django ORM and SQL Alchemy. Rails ORM goes a long way, but it feels eons behind Django's.
Code reviews are for that, I've taught countless junior developers on how to write better SQL through code reviews. I've introduced them on how to mitigate the common exploits and how powerful SQL can be when needed and I believe they are better engineers because of it.
I really don't like using a tool as a crutch to a systemic deficiency of a team, in the long run you are only compounding the problem, the more complex the system gets the more workarounds and gotchas you have with ORMs so it's better to educate developers on how to not depend on them from the get go.
Yup, ORMs have their place in some projects but in the majority of what I've worked they brought more pain and confusion than just properly effing learning SQL.
> I really don't like using a tool as a crutch to a systemic deficiency of a team,
Very valid point, but then this becomes a discussion of how much you want to create mitigations in the system vs. educating developers to topics of, for example security. The larger a team becomes, the harder it is to police that every single query in your system is sanitized.
Still on the security side, Software architecture tell us not to repeat ourselves, and if sanitizing query strings can be done centrally through an ORM, then why shift the responsibility to code review processes that are especially prone to failure due to their human nature?
As you said, these kind of become a crutch to people who don't know SQL. People begin to rely on those instead of knowing the underlying mechanisms -- but hey, that's the price of any abstraction. Probably this same argument was used when high level language compilers (analogous to the ORM in this discussion) were introduced.
It will boil down to: how much more complexity having leaky abstractions such as ORM adds to the project compared to teaching others to not make the same mistakes all over the codebase with raw SQL.
As with anything related to software design and architecture, it depends, depends on the size of the project, how many teams work on the same codebase, etc. I don't think using an ORM as the sanitisation layer of your SQL queries to be a good enough feature compared to its drawbacks on complexity and abstraction leakage. The object model doesn't serve relational data very well, that is already a first sign and smell for me that ORM should be used in very specific cases instead of as the default go-to solution.
Having to deal later on the application lifecycle with partial entities and the likes (anything to do with partial data access compared to the models' defined fields) is really painful when done improperly, which is easy to do, and then we are back on square one having to teach developers how to use an ORM framework properly. Also one thing I really dislike (and that I believe will always increase a system's complexity) is making code less explicit, an ORM layer will make your code flow less accessible, you'll need to know where the layer takes over and where it leaves you to implement more complex use-cases and then you are back to writing SQL queries manually or depending on some set of features from the library implementing the specific use-case you are looking for.
I don't hate ORMs but I have mea culpa in using them improperly and learning from my past experiences, nowadays it's really hard for me to justify its usage aside from very simple data access patterns. I say that because in the end you'll always need to reason about what the ORM layer is doing in terms of SQL concepts, translating between those layers in my head is adding another complexity: cognitive load.
> I find these ORM opinions generally coming from someone who is used to, and is proficient in SQL, and only when they have a complex use-case/operation at hand.
I'm reasonably proficient in SQL. My practical blocker with ORMs in general has been that you still have to understand your application obviously, and you have to understand SQL to fix most things, and now you're adding effectively a new language and framework between them.
> I have had great experiences with Django ORM and SQL Alchemy.
I think the best ORMs aren't ORMs at all, but larger frameworks like Django and Fails, as you're making the investment to learn that framework anyway.
And in fairness, I started with Hibernate which is particularly complex, having its own query language. Many ORMs have gone the no-object route and try to directly represent relational concepts, especially jOOQ[1]. That's smart engineering to recognize that problem and minimize it, but at that point it's less of an ORM and more like numpy for the relational algebra.
> Because not everyone is constantly mindful of SQL injections.
I'm a senior developer but just started using SQL seriously a few months ago, and only for my own projects, so don't have much experience; but it's difficult for me to understand this. Both in Python and in Go, every single tutorial tells you how to use parameterized queries, and every answer on StackOverflow has warnings of building SQL queries from normal string operations. Fifteen years ago I could see this being an issue; but is it really possible that in 2019 "inexperienced web developers" manage to write code vulnerable to SQL injection? Is it really only my hard-won paranoia, gained from decades of building operating systems in C, that made me immediately take these warnings to heart?
(That's a genuine question, not a rhetorical one.)
You'll have to pry Hibernate and Spring Data JPA from my cold, dead hands. Extending CrudRepository and getting a ready to use repo for an entity type, combined with annotations for customer JPQL queries? Yeah, I'll take that over any hand-rolled, low-level SQL interfacing for almost anything.
But, as always, "use the right tool for the job." I have gotten tremendous value from ORM's over the years, but I'm not going to insist on using one for everything just "because".
Ha. Funny but I’m the exact opposite. Had to use hibernate and spring once. Trying to debug with 45 XML config files scarred me for life. That coupled with MySQL who even Ellison will tell you, is a toy, I saw the writing on the wall and quit the job a couple of months before that whole team got fired while the company had no choice but to throw the code in the bin. My view is that ORMs can never compete with a proper enterprise scale RDBMS with people who know how to use it.
JPA/Hibernate has moved on a lot from the church-of-XML days. The basics are a lot smoother, with annotations and sensible defaults and so on, and there are some pretty nice additions.
My favourite neglected feature is JPQL with constructor expressions, which lets you write queries in an SQL-like language, but in terms of objects rather than tables, which is slightly easier, and materialise results directly into objects of your choice.
One of the nicer JPQL features is being able to navigate the Java object model and have that translated to the equivalent joins in SQL.
However, there are things that JPQL can’t do, but it’s easy enough to create a native SQL query and have its results map to JPA entities. I use Spring Boot (JPA, Spring Data, Spring Data REST) for the sheer convenience of it and speed of development, not to abstract away the database, which is always PostgreSQL. I am very comfortable writing SQL queries and making use of PostgreSQL-specific features and Spring Boot with JPA certainly lets me do that when I want to.
How so? I think the overall point of my post is clear, and consistent: I am a big fan of ORM's, won't be giving them up because some blogger-guy wrote another "ORM's are evil" post, and find them very useful for almost all of the things I do with databases. But the "almost" there is the key word. So, again, use the right tool for the job. Sometimes, in my experience, it is an ORM like Hibernate, other times something else makes sense. For example, I do some work with Redash for creating dashboards, and in that environment, you're working with SQL directly. shrug
At my company we use C#/EF with 50 developers and a gigantic codebase and database.
A statically typed binding to the database with 'find all references' etc.. The ability to write a few lines of simple understandable code that turns into a much more verbose SQL query aids in readability, maintainability, and understand ability of the code base.
I have used other ORMs and feel that EF/LINQ is way simpler in that the same query syntax is used for in-memory and database operations. They can be mixed and matched for doing really intensive data processing with relatively few lines of very easy to read code.
My only comment would be that using an ORM - you do need to understand how it works, and how the code you write turns into SQL and a lot of the details of contexts and tracking.
It's like flying a plane - big, complicated, it will get you where you need to go very fast and effectively, but if you don't know what you're doing you'll crash right into a mountainside.
IMO, SQL is a domain specific language that is far from verbose. If you need to understand or predict the SQL and let that influence your ORM code, then just use SQL.
If you're using a dynamic language, an ORM doesn't provide much. But in C# LINQ has the benefit of being strongly typed, this means your DB schema can be evolved. If you write raw SQL (or Python) and you rename a table/entity or column/property you gonna have a bad time. In C# you can just use a refactoring or, worst case, compile time errors.
You can write very simple ORM code that results in very complex and verbose SQL. Especially when you want to return structured data that includes internal aggregations. If an ORM is not used, you often see a lot of data manipulation after the query is performed.
If you stop your examination of a technology at the most trivial use case I'm not sure what there is to discuss but...
Right off the bat, its obvious how to do a transaction in the first example but not the second. Its also obvious to use complex SQL functions in the first style and not the second.
If you find yourself having to type insert queries over and over again, then you're probably doing something wrong. I mean it takes like what, 5s to write that line of SQL?
It is the same as with all frameworks: it promises productivity and speed of development, but in exchange for that the problem that you are trying to solve must fit within the model and the assumptions that were made by authors of the framework. In the end, that is never really the case and depending on the situation, fixing the last 20% of your work takes less/all/more of the amount of time that you saved. So sometimes a framework is a good choice, but is never as easy as the tutorial makes it seem.
ORM's are a bad case of this, because they try to bridge the impedance mismatch between how data is stored and linked in the database and the programming language you are using. This creates a forced model and a whole lot of assumptions and in the end it is still not good enough. Fixing the part of your project that does not fit the ORM's assumptions gets magnitudes harder because the bridge they built is very complex.
Yes, most ORM's allow you to write direct SQL queries, but that does not really solve the problem. You work the way it was intended, or you do not not. If the latter is the case, you will encounter difficulties, no matter the way break the 'rules'.
If the code is written clearly and it uses just SQL, I can almost immediately understand what is does. For ORMs it takes a lot more effort to understand what's going on when you read the code.
I'm also rusty on my assembler because I generally use high level languages and compilers/interpreters; that's what abstractions do, and not really a reason to avoid them, since people can generally solve the problem anyway, it just takes a bit more time (more than saved by using the abstraction most of the time).
But assembler is the wrong analogy for SQL. SQL isn't a lower level version of procedural code, it's a domain specific language.
When you write a technical paper in English, you switch to math equations when appropriate. Math isn't assembly language, its the correct language. It's the native language.
Yes, you can always describe your equations with English prose—and if you're doing simple addition and subtraction, it's probably nicer to write that in English. But once you want to do anything remotely complex, writing it as a math equation is more terse and less ambiguous.
> Yes, you can always describe your equations with English prose—and if you're doing simple addition and subtraction, it's probably nicer to write that in English. But once you want to do anything remotely complex, writing it as a math equation is more terse and less ambiguous.
Right! But 99% of the time, I do not want to do something complex - I just want to load a few rows based on simple search parameters, and save a changed values (which may involve heavy data processing, but not relational). Hence only using SQL rarely, and therefore getting rusty.
If you're only doing basic CRUD operations, then fair enough.
That said, based on my experience analysing applications, "loading a few rows" to perform "heavy data processing" and then "save a changed value" smells suspiciously like "the entire task can be rewritten as a single, moderate complexity query and the data never even had to leave the database server."
I'm not saying that's the case in your own generic hypothetical, as there are indeed forms of business logic and advanced manipulations that exceed the scope of a database server. But you might be amazed at what you can do within the SQL language. For example if you aren't intimately familiar with window functions like DENSE_RANK() then do yourself a favour and learn about those—and then contemplate how you could use them within a subquery joined to a table that you're updating.
Well, my current "heavy data processing" is generating a 3D render from some metadata stored in the database. Postgres is amazing, but I don't think it can do that yet :)
My experience over my career has been this: there's CRUD, lots of IO, and a bunch of data processing that just needs specialized software (image, video, weather simulation, etc). I try to offload what I can to the database - and yeah, I know about window functions - but other than for the occasional report, it just doesn't move the needle.
The sweet spot for me is a query builder: a tool to generate SQL and treat queries like code, and not get in the way like a heavy ORM. Shout out to http://knexjs.org/ for Node.
On the other end you can often build better UI's to reduce the amount of dynamic query building required. Dynamic queries are most heavily needed when you give users a data table and let them sort/filter as they wish, more often than not this is a complete failure of the UI to understand users workflow and build around it.
I've used Sequel in Ruby for that, and... it worked pretty well, to be honest. But it was for a service that did some complicated DB shenanigans, and not any user API. I wouldn't skip out on ActiveRecord when running a public facing Ruby app.
I don't use "raw Arel", I use most of the parts of ActiveRecord -- but I definitely appreciate that about Arel, and use that aspect of Arel intentionally. Composable query parts made of code.
I think it's a mistake to think you can't write good efficient SQL for well-normalized schemas with AR. You can. Usually anyway, for a great many use cases.
[Arel is _really nice_, but sadly Rails absorbed it and considers it "private API", especially in it's most sophisticated features. I use em anyway, they generally don't break... but if I wasn't using ActiveRecord as a whole, I'd probably use Sequel instead.]
I work with Rails almost every day and I'm still quite fuzzy on where Arel ends and ActiveRecord begins. It's an area of ambiguity that a technical post on the history and contemporary status would be appreciated by many.
For the most part, the stuff you do when constructing a query is Arel. Where you chain methods and such.
When it was first added to Rails, improving the query building possibilities, it was a separate gem that Rails depended on. I'm not sure if it was written specifically for Rails originally just maintained in a separate gem, or intended by it's original writers to be an independent project. But it was later absorbed into the Rails repo and we were later told by Rails maintainers that it's more sophisticated manual API (look up how to make a subquery with Arel for instance) were not intended as public API and had no backwards compat commitment.
ORMs are amazingly useful in the limited case that you're performing basic CRUD operations or simple joins.
In any sufficiently complex codebase you'll be doing more than this, however, and the work to support the ORM properly then outweighs the work of simply coding the raw SQL and preparing statements as appropriate.
People hugely exaggerate this IMO. If you're writing a lot of queries that can't be composed, with sufficient performance, in ActiveRecord, but can be in raw sql, then you've probably done something hair-brained else-where.
And for those special (and IMO pretty rare) occasions, you can drop down to arel or raw SQL anyway. Why throw away the consistency and readability of something like AR for edge cases, when you can just treat your edge case as an edge case with raw sql and still keep AR for your other 95% of queries.
I maintain a relatively unremarkable but bespoke online discussion forum, which has hundreds of queries, few of which could be composed by an ORM, let alone composed and run performantly. The median complexity query in my code base probably has two or three joins, two or three subqueries, and some kind of aggregation or window function.
The result is a typical page runs around two or three queries total—one query to authenticate the user and load everything about their profile and permissions, one to load the entirety of the data being output on that page, and occasionally one to update a statistic somewhere.
(The authentication query runs on every page because there's absolutely no persistence in the application layer. The authentication query goes three layers deep in subqueries and includes half a dozen joins. It hits perfect indexes when it runs and takes only a few msec round trip.)
> People hugely exaggerate this IMO.
In my experience, people who think an ORM can do most things are simply under-experienced with SQL and set theory.
> The median complexity query in my code base probably has two or three joins, two or three subqueries, and some kind of aggregation or window function.
Well, just from that description, Django ORM could do it. Can you post an example of a median query? I'm curious to see why it can't be ORM'ed.
This is a heavily redacted, completely renamed and summarised version of a typical page data query. It's less complicated that the top three most common queries that run to build the most common pages.
SELECT
b.field, b.field, b.field,
group_concat(concat(y.field, useful_thing)) as useful_things
FROM (
SELECT t.field, f.field, w.field,
(case when w.wid is null then 0 else 1 end) as has_watched,
exists(select id from posts p where p.tid = t.tid and p.uid = :uid) as has_posted
FROM (
SELECT tid
DENSE_RANK() OVER (PARTITION BY foo ORDER BY bar DESC) AS useful_pseudo_id
FROM editorial_things
WHERE thing = :thing
ORDER BY something
LIMIT number
) as a
INNER JOIN thread as t ON t.tid = a.tid AND t.last > Now()-INTERVAL 2 DAY
INNER JOIN forum as f ON f.fid = t.fid AND f.fid in (:security)
LEFT JOIN watched as w ON w.tid = t.tid AND w.uid = :uid
ORDER BY something
) as b
GROUP BY something;
The other thing is that being returned objects just adds complexity and handling. The language I use has a perfectly nice native, iterable data type for returned database records (kind of like an array of dictionaries) and I'd rather just use that instead of an object middleman to satisfy some kind of OOP completionist fantasy.
CFML (i.e. ColdFusion) using the Lucee engine on the JVM. Don't believe the popular scorn: it's as good a language as any other for web development. Perhaps not as innovative as newer languages, but it's densly packed with pragmatic conveniences and (for better or worse) doesn't impose any particular architectural style upon you.
CFML has a "query" datatype that represents the rows and columns returned from a database combined with useful metadata and really neat features like n-level iteration where values repeat. To the programmer it's like a dictionary that magically changes its values by passing it to an iterator. Without the iterator it works like a dictionary of the first row. Or if you treat it like an array you can manually read any column in any row directly.
I agree. Don’t know why you’ve been downvoted. Your experience directly mirrors mine.
What I did find interesting was the pushback I got from my team when I told them that I wanted to drop the Java ORM (JPA). One guy, who I respect, was super critical of the decision. He was telling me that SQL was the reason we had a performance problem, but when I investigated I was astounded by how fast SQL was, even over JDBC. It was clearly the ORM that was slowing us down.
In his defence, when I was able to show him literally 100x performance improvement after moving a service to PL/PGSQL and MyBatis (Java), he changed his mind.
In most applications you're going to be doing both things, I remember Rail's ORM was always presented this way: it's the far better option for simple object relationships and CRUD style systems but you can still write raw SQL as needed for complex stuff or things requiring performance optimizations.
There's plenty of ways to manage complexity in software. Two or three different databases in a single app doesn't kill them, nor will two different DB interfaces.
I think the opposite is True: SQL is backward to a programming language. Creating statements by concatenating strings? An ORM makes real objects and types you can actually instantiate and use, or even type check.
You don't create statements by concatenating strings. Use prepared queries with placeholders and pass values. Databases will compile the query once, then can run it many times.
Sometimes you have to concatenate, e.g. to specify order by clauses. This is something database vendors need to fix. There need to be ways to tell the database to modify a prepared statement without changing it's SQL. You might be able to do it with a crazy IIF, but then it's likely not to use indexes well.
Prepared statements on anything but Oracle are only asking for trouble, if the distribution of your parameters does not match that of the query the statement was compiled with you can easily end up with bad query plans. Furthermore these issues are hard to analyze (no parameters in logs) and even harder to solve.
I have hit this problem in just about any system using prepared statements. The performance overhead of query planning is minimal compared to the headaches it prevents.
> I think the opposite is True: SQL is backward to a programming language. Creating statements by concatenating strings? An ORM makes real objects and types you can actually instantiate and use, or even type check.
That you view the situation as ORM or concatenation terrifies me. No wonder SQL injection remains so prevalent.
Prepared statements are what you use here. You provide a query with placeholders for the values you want to fill in, and then you call that prepared statement with the arguments you need. These are typically type-checked and hardened against SQL injection, but of course you should verify with your particular language/library combination.
The problem is prepared statements aren't composable. So saying i want this query but also with this extra where in there and a limit and an offset means you have to write a whole new prepared statement. Or concatenate strings!
With an ORM you can construct the prepared statements programmatically.
> The problem is prepared statements aren't composable. So saying i want this query but also with this extra where in there and a limit and an offset means you have to write a whole new prepared statement. Or concatenate strings!
Absolutely, prepared statements would not be appropriate for that.
> With an ORM you can construct the prepared statements programmatically.
That's not limited to ORMs though. You can also just use a query builder. In fact I would argue that an ORM that provides that functionality is a query builder with additional ORM features on top.
It sounds to me like there's a reasonable middle ground here, where you treat your SQL query more like an AST than a string. So you can do something like myquery = select("col1").where("col2 < {}", my_value), then later myquery.limit(10).offset(20).
You're still working with SQL as a language, but you're manipulating it directly instead of trying to go via string formatting.
// In your handlers, construct queries and ask your models for the data
// Find a simple model by id
page, err := pages.Find(1)
// Start querying the database using chained finders
q := page.Query().Where("id IN (?,?)",4,5).Order("id desc").Limit(30)
// Build up chains depending on other app logic, still no db requests
if shouldRestrict {
q.Where("id > ?",3).OrWhere("keywords ~* ?","Page")
}
SQLAlchemy has this. It sits alongside and works in tandem with the ORM, and I've used both in a projects (and also raw SQL), each where appropriate to the use case.
You can use functions to create basic "getters" for application data structures. Layering these through further functions for limits/offsets/etc is simple (and you can write those layers in a generic way).
You just have to get used to the mental mindshift that all your data manipulation code lives in the database, and your application simply consumes data from single function calls.
> That you view the situation as ORM or concatenation terrifies me. No wonder SQL injection remains so prevalent.
I don't think others mean concatenation of values where placeholders should be, but the fact that the query itself is a big string. Unless some sort of query builder is used to literally build this big string for you.
SQL is not compatible with programming languages because in programming language you are describing how to do something, while in SQL you are describing what the result supposed to be.
What would be great to have first class support in the language (or perhaps IDE) that would understand database schema and database types so you could have error highlighting and autocompletion.
Something like JOOQ for Java seems to do that. Seems like a good idea at the surface, though I don't currently program in Java and don't see anything similar in other languages, so I'm wondering what's wrong with it
After more than 10 years using those ORMs at work, in DotNet/Java worlds, my current opinion on that topic:
-the only project where ORM was worth was a simple backend admin project with many tables and very little load: ORM avoids having to write tedious simple CRUD queries.
-for recruiters, it feels more simple to find people with language X than to find people with language X + SQL.
OK but ...
-to understand the intricacies of ORMs takes a lot of time and effort, probably as much if not more as writing proper SQL.
-illustration of the previous point: it is hard to know the amount of data the ORM is really carrying, hence the common situation where half of your DB ends up in memory.
-developers tend to forget performance improvements. For instance, in C#, IQueryable objects are used everywhere, meaning that in any layer of your app, the query that will be pushed to your server might be modified: hard to tweak your performance in such context. More generally, this additional layer makes everybody postpone the time when you have to think about DB performance.
-it prevents you to use advanced DB features (ex.: built-in audit trailig and versioning), either because nobody ends up knowing SQL DB anymore, or because ORM makes it harder if not impossible.
-queries sent by the ORM to the DB seem always huge and be returning more than they should.
-for performance or complicated requests, you always end up adding some SQL to your ORM.
-because of the previous point, the maintenance advantage of ORM, namely being able to automatically migrate your DB with your code is lost: you end up having to maintain more or less manually some SQL.
-letting the ORM automatically update the SQL DB during migration on real data always make people sweat: if any problem occurs during that process, welcome in hell. I feel always more comfortable being able to write some SQL fix in worse case.
-for my side projects, I never use ORM, and enjoy it.
Completely agree. I've been using Dapper for a few years now and it's amazing because I have control over everything. I created a simple overlay, which also allows me to do some very specific things that you just cannot do with a traditional ORM (like having dynamic table names)
Anyone wanting to set up something similar just watch Mosh's Repository Pattern
I modified this pattern to use dapper instead of entity framework, but he already decouples entity framework to the point where you still have control.. so you can do either.
ORMs are solving a problem that really should be solved on the DB layer: providing a non-bullshit programmatic API. As a developer, I DO NOT CARE whether the value of my object is saved in a different table or not I JUST WANT IT SAVED. A DB that will provide a good programmatic API AND analysis interfaces will be golden. And there have been a few good attempts reaching there. ORMs will be solved when you don't need an ORM to access your DB.
Storing data is a hard problem and likely to be a leaky abstractions. "Save my object" is OK until you need to ask questions like how to find such objects, by which attributes. If I update this and another object, do I care that they are done in a single transaction?
An "I JUST WANT IT SAVED" friendly abstraction would be JSON as a value in a key/value store. As a bonus you can choose your consistency/availability trade offs by going "nosql". For example Azure table storage. Nice!
But now what if your object is an invoice with a link to an account. Is than an ID or is the account info embedded in the invoice. Etc. There is a lot of design thought that needs to go into data. You can stop using SQL, Tables or whatever but these problems don't disappear.
ORM allows you to almost forget about these issues for simple CRUD-like tables. And for the start up style SaaS doing a bunch of boilerplate crud stuff (to begin with) something like ActiveRecord saves the day. But anyone using such conveniences should also learn database design, query optimization and such things too.
The beauty of an ORM is that an ORM backend can be adapted to your proposed system. Or at least some subset of the ORM's spec can be. You can get a nice language specific API for common patterns, migrations, and the power to drop down to system specific escaped SQL queries when necessary. You get a layer of independence, in case you really want to switch to another backend for some reason.
ORMs are necessarily language-specific, and as a single DB interfaces with multiple languages it’s impossible to do as you suggest (that’s what SQL is anyway).
An object in JavaScript has different semantics and conventions around it (everything is a prototype and a dictionary simultaneously) compared to C# or Java (everything is statically defined)
I have no idea what the API you describe would look like. Where you somehow describe to it data to save (and fetch), without any reference to tables, the fundamental unit of an rdbms? I guess it'd probably look a lot like... an ORM?
I believe that ORMs try to solve a problem that fundamentally cannot work. As long as you keep the usage simple and use your database as an object store, as a majority of projects do, all is fine and dandy. But once you start to look into your database a bit more, and try to use it for what it's capable of, you'll run into problems. It it because the fundamental building block of the data model of the languages in question here are objects, while the fundamental data block of RDBMS are relations. As soon as you try to write custom queries which join columns from other tables, then the simple table<->object mapping ceases to work, the OMR becomes lost and it starts to get in the way, and can make you lose a lot of time.
I've personally settled on using a "light" ORM, which does the mapping of result sets to objects, variable replacement, and connection handling, and I write the SQL myself. It's still not ideal, because for statically types languages I need to write ad-hoc object types for query results, or deal with untyped data, but at least I don't get any surprises.
Yeah it's kind of a false dichotomy isn't it? Just use your ORM for CRUD operations and use proper SQL if you're doing anything much more complicated than basic joins. It's a lot better than the busy-work of hand-cranking every basic SELECT/INSERT/UPDATE/DELETE statement, preparing them and the associated boilerplate that you usually have to write to stick it in/extract it from your object model.
I wholeheartedly agree with the article. Every time I've used an ORM I've regretted it. SQL is already a great DSL for working with relational data, and there is no reason to create a leaky abstraction on top of it. The reality is that there is no practical way to do efficient mapping from relational data to object models, and pretty much every ORM ends up generating horridly inefficient queries when you start getting into non-trivial cases. At that point you're going to have to go in and write the queries by hand anyways. So, you might as well just skip that step and work directly with SQL.
The worst part is that you often find out that ORM is affecting performance when you start hitting real world loads in production.
It doesn't always affect performance. A lot of business apps are mostly doing CRUD under fairly light load. It may be that beefing up the database server is much cheaper than interacting with the database in a way that, while being more performant, is also more laborious.
That said. . . yeah, I've also had my share of cases where the app ended up seeing higher load than anyone anticipated, and it's hard to even see the source of the performance problem, because, with an ORM, it takes a little bit (or, with the worse ones, a lot) more work to find out what the actual queries being executed are so that you can ask the DBMS to `EXPLAIN` its query plans.
Except, I don't find using SQL more laborious, and it if is then I'd suggest that's a problem with whatever language makes using SQL directly laborious.
I'm pretty in love with schema-driven development with https://www.graphile.org/postgraphile/. I generate a graphql schema from postgres, and code models/serializers from that.
Currently using migra and some custom scripts for migration management - will check out automigrate.
The main issue I've had with developing a large backend in pure SQL is that you end up having to piece together custom scripts, migration tools, etc. to add features your database/SQL doesn't provide, like modules and higher order abstractions. So I'm actually considering switching to something like SQLAlchemy simply for that
We went from hibernate/eclipselink to MyBatis - and have never looked back. Needing to deal with the idiomatic behaviour of JPA, HQL and the target SQL is a nightmare. We’ve literally removed tens of thousands of lines of Java code by dropping JPA.
Same. When writing complex queries, Hibernate just got in the way. MyBatis just lets you map any result set to objects as you see fit. I also make use non-standard features such as GIS, gin basesd queries, arrays and json in Postgres and this is just easier with MyBatis since I am still writing the queries in SQL. Performance seemed better too.
We’ve taken the view that audit should be done in the database layer using, for example, triggers. Audit at the application layer is less resilient then at the SQL layer, IMO.
In fact, an important part of our transition away from ORM was to invert the ORM-centric relationship between the application and the database. We wanted our tech people to be able to manipulate the database natively via SQL, which is generally much easier and more efficient than writing and deploying Java code to do the same thing.
For us, this meant moving most data manipulation logic into the database using PL/PGSQL. Doing so has empowered a team of non Java admins, massively improved performance, and significantly reduced LOC. I’d say that we now have about 30% (1/3rd) of the LOC in PL/PGSQL than we had in Java+JPA. Obviously this means less bugs, but performance is literally 100x in some cases.
I think that if you take a Java-first view of your application then all the complexity of JPA is part and parcel of any solution, and that’s fine for you. But we’ve found that taking an SQL-first approach has been great for our use cases, and has had lots of ongoing benefits.
Of course, we needed to build a bunch of tools to help with this. For example, we built a gradle plugin that treats SQL code like Java code so we can write libraries in SQL with transitive dependencies. We also needed to build testing tools and tools to automate schema migrations for CD.
I do think the lack of tooling around managing large SQL-based projects is a blocker to wider adoption of our approach, but the benefits of going against the ORM grain have been very significant for us.
That's a fair criticism about auditing. Doing it through database triggers avoids the problem with audits being missed because of code that doesn't participate in the Hibernate lifecycle (including native SQL queries and Criteria updates, yikes!).
I will also concede that under some circumstances it's too slow to bring the data to the code, and instead you have to bring the code to the data (i.e. PL/PGSQL). The sort of speedup is mostly from eliminating round trips and data marshaling, however. That's not quite a like-for-like comparison of an ORM vs a mapper.
The reason I stick to ORMs is mostly because of RAD tools that save me so much time. For instance, JHipster generates liquibase migrations and JPA entities. This gives me a relational schema very quickly. To avoid any "surprise" queries that tank performance, I do turn Hibernate's statement logging on when developing new features.
Next time if I can find some more tools to help with an SQL-only + stored procedure approach, I'll give it a deeper consideration. Maybe convince your company to open source some of tools it has developed!
There is a lot of internal enthusiasm for open sourcing our tools, but we are heads down at the moment so it’s just a matter of time (or lack of). I think we will time it for the next pgAU conference so we can talk about it at the same time.
I definitely agree with you that tooling is a big deal. We did spend a lot of time manually proving it out before we spent the time on the tools. It was a bit of a leap of faith but it quickly became obvious that we were onto something.
And you’re right, the point of our approach was to move the code to the data. We deal with reasonably large, complex real time data sets Sotheby’s round trips and marshalling become the dominating contributor to total time. Hence our ability to improve some operations by 100x.
I can see that if you have a familiar and reliable tool chain and a compatible use case then ORMs could be great. I think our problem was that we had neither, so it was never going to work out too well for us!
No, I did look at JOOQ briefly, but MyBatis came somewhat recommended and perfectly scratched the itch I had at the time. So I don’t have an opinion either way with JOOQ. I am however super happy with MyBatis, we’ve been using it 2+ years in dozens of microservices, but we are KISS oriented and so only use a subset of its functionality (no XML config for example).
"Designing" the database by creating objects first and exporting create statements is ignoring the responsibility to design the database completely, the database which is the foundation of almost all apps. It's irresponsible and leads to a database that isn't optimal at best and doesn't work at worst. The database design should be one of the most important tasks of an app and it shouldn't be an afterthought. The design shouldn't even involve code, even sql at first. Getting the data structures correct is most important. The code will flow from there. "Show me your data structures and I don't even need to see the code to understand how it works."
Putting an orm layer on top of this refusal to do one's job and design the database is the second nightmare of an orm. Adding a new language and new paradigms that don't make much sense just to translate data into objects adds a ton of completely unnecessary complexity. Not to mention many orms are slow either at fetching or hydrating data. They are just unnecessary solutions for nonexistent problems. It's too bad most people have no experience with database design or even sql to understand and see this. Most people prefer to make apps more complex unnecessarily rather than reduce complexity and think that makes them experts. They are usually the ones advocating for the unnecessary orm layer. On the other hand, this problem mostly goes away when you no longer deal with objects in oop languages, so there is hope.
Here's the deal: if you design a decent entity manager, you shouldn't have to worry about how the database works.
A database was created to ingest and query user data. Programs pretty much ingest and query user data, the amount of regular (non tech) people using SQL is too damn low.
So, IMHO, programming the data ingestion and query should be easy, and a good ORM framework should handle it. Of course, to make a programmers life easy, they should start following mindful and clean practices - Otherwise any tool will be a gateway to hell.
> the amount of regular (non tech) people using SQL is too damn low.
That is still no reason for me to use an ORM instead. Learn SQL then, that's way more valuable than to learn the next ORM library with all it's issues.
So wrong, you always want to know how your database works, you should be tuning your database, and queries against your specific database. Every database has unique technology you can take advantage of and there are performance tuning techniques and design techniques you should be aware of. If you're doing simple INSERT/UPDATE/SELECT then sure, but in a large organization with tons of data that's not gonna cut it.
- SQL is not an artificial layer, an ORM however is
- SQL is a language, it's up to you as a developer to tune your queries against your database (e.g. create and use partitioning, indexes, statistics gathering etc.)
- If your using an ORM just to send data via a native protocol then why use the ORM in the first place?
- Performance tuning is very closely related to how you write your SQL
- This is my opinion but I think almost any usage of an ORM will mostly be bad, no real arguments to support otherwise. Let the database do the work, that's what it's there for.
SQLAlchemy can get pretty complicated with some of the hairier use cases; raw SQL can too, but SQLA adds another dimension to it:
* SQLA can infer joins, but sometimes it can't and needs hints - then you may worry that you miss some combination of table join it doesn't have enough information for.
* In one case, SQL inferred a join incorrectly and I ended up with dupe results of a cross join. this is possibly because of a certain primary key duplicated on a few tables. The way SQLA infers joins is a bit magical, and it's not always obvious how to modify that behavior.
* dealing with joining a table on itself can be complicated, when you need to distinguish between those two versions of itself (using an alias?)
* how about this use case: modify a set of filters to apply at two levels: one in filtering rows before a select query, then again to filter the result of joining those original results.
* If you operate with objects/models, for saving/retrieving things, it can often be hard to mix with raw SQL which can't easily be modeled (at least without committing everything, and then executing the SQL). That means you need to communicate that SQL in object syntax, which is far more complicated.
* for migrations, I need to redefine some models, because I need two - one representing the old DB structure, and another representing the new structure.
It's possible some of these have better solutions than I've found, but I find it's often hard to figure them out.
One problem I’ve seen is that most non-trivial apps have both OLTP and OLAP elements in the same places. Think of how many “mini-dashboards” you see sprinkled throughout the UI of a modern CRM.
ORMs are generally not good at or even capable of efficient set-based operations spanning entitles.
Building a dashboard with an ORM results in pain and wildly oversized DB instances in my experience.
Maybe I’m in the minority, but I’ve found that if the right ORM fits your brain, it can be incredibly useful for a lot of CRUD operations, especially if it offers escape hatches to lower levels of the abstractions; ultimately down to raw SQL.
SQLAlchemy for Python is that for me. I find it incredibly expressive and allows me to write complex queries that are readable, maintainable, and perfomant. Yes, it requires learning, but it’s the right tool for the job, personally.
It's exactly what you want. It blew my mind, and taught me that python syntax can natively express SQL semantics. This gives me all that ORMs are really good at -- letting you avoid having to write one language in string-form inside of another language.
ORM's are yet another instance of "Big Dark Grey Boxes" (BDGB). They are powerful tools that do a lot for you IF they work as intended. When they don't, you have to experiment and fiddle, and perhaps end up fudging up an ugly work-around to meet deadlines.
Templating engines (like Razor), UI formatting engines (Bootstrap), and routing engines (URL path translation) are also BDGB's with a similar great-until-broken profile.
I most cases it seems we can could have simpler versions that are only a few hundred lines of code. If this few hundred can generate 90% of our SQL or whatnot, then we get most of the time savings without the complexity: we can study or fix a few hundreds lines of code. Chasing that last 10% has created these bloated BDGB monstrosities. Leave it at 90% and enjoy simplicity. It seems components have a handling-curve something like this:
80%: 100 LOC (LOC = lines of code)
90%: 250 LOC
95%: 1000 LOC
98%: 3000 LOC
99%: 6000 LOC
The percent value here is the percent of situations the component handles. In ORM's case, it would be the percent of SQL code generated by the component.
We should strive for components that assist us with grunt-work, not hide the process of doing it behind too much code to grok. If it's simple enough, we can also tune it to better fit shop conventions.
For big development shops, it may make sense to assign specialists to master BDGB. If they work with it all day, they'll eventually figure out most of its quirks and oddities. It's specialization at work. But smaller shops often waste too much time trying to troubleshoot irksome BDGB's because they don't have time to focus on any one.
So many people in this thread seem to think that your options are ORM or string concatenation. It explains a lot about why both ORMs and SQL injection are still so prevalent.
Does no one teach prepared statements in schools/online classes/textbooks? Does no one go digging underneath their abstractions to learn what they're providing of value and what is easily available without that abstraction?
> ORM protections against SQL injection attacks are worth the clunky syntax. Change my mind.
You can create what's called a prepared statement in most databases. This is a statement that contains placeholders for parts of the query that you want to change at runtime. You then query against this prepared statement, providing arguments for each placeholder. These arguments are typically typechecked and protected against SQL injection.
(I only say "typically" because I'm sure someone out there can provide a database/driver combo that does the wrong thing, but I'm equally sure you can find a misbehaving ORM that does the wrong thing as well.)
Not discounting the monumental work that SQLAlchemy was to create. It’s a complicated set of machinery that allows one to plug in any relational backend and then just define models and go and if your way of thinking fits ORMs and sessions and you want to use lazy loading etc it’s great. Personally I would rather couple my code to my database a bit more and go the route of raw SQL in code or stored procedures on the database that then acts as an abstraction that my code can call into.
I’ve worked at places that have taken both approaches and each has advantages and disadvantages but I would take writing my query over an ORM any day. I am just too paranoid too leave any performance on the table and don’t want the cognitive overhead of an ORM getting in the way of my queries. That being said SQLAlchemy is pretty awesome when used as a query builder — I think that’s a good compromise if you must use an ORM and it is SQLAlchemy then I’d use it like that.
ORMs are the technical solution to a social problem. From a pure technical point of view, a "Database" (with capital D) that allows for user defined functions (see Postgres) with a canonical "Domain Schema" expressed in SQL (DML & DDL) is technically superior.
The social problems imo are:
- (contrary to blog's assertion) most programmers are scared to death by SQL, and are far more comfortable with iterative programming than declarative/functional programming in SQL.
- ORMs in part also addressed organizational issues. Databases used to be the domain of DBAs. ORMs to a large extent wrested the control over the "Domain Model" from DBAs to Programmers.
From a purely technical pov I believe that the facility of user defined functions (c.f. Postgres) permit for complete and coherent structural and semantic description of a domain model in the database. However this is nearly universally hated by programmers as it requires programming in the database.
After reading "Functional architecture is Ports and Adapters" [1] and using ORMs (Hibernate) for some time now, I'm wondering whether a functional approach wouldn't be better in many cases. With such an approach you would have the following flow:
HTTP adpater receives request -> object is loaded from database as immutable object -> business operation creates a modified copy -> DB adapter persists the new object.
The last step could be improved if the DB adapter would look what was actually changed. Since everything is immutable it is as easy and fast as comparing references (like React does btw). This works even with Domain Driven Design, what is usually used with mutable entity objects.
They have their place. I particularly like Ruby's ActiveRecord for handling relational data. The extra caching and easy eager loading is really nice. It sucks for high performance batching processing but it works great for web requests. I wish it was a little more complete when it comes to bulk updating/inserting ActiveRecords
I don't agree that ORMs are backwards. They often go both directions so how can it be backwards? You can usually generate schemas from your code objects as well as vice versa.
No, ORMs are usually solving the wrong problem entirely. Your code objects and your schema are solving different problems and should not be directly mapped.
As for other things they do...In my opinion, they usually ruin the declarative beauty of SQL by creating a lot of imperative methods, but I admit this is _mostly_ a preference thing.
That said, _I've_ never seen transactions or complex queries done properly in an ORM without simply giving up and using SQL.
But I still use a thin ORM. Who wants to write a result set parser? You just need to reason about and be aware of when you want to create an object for a custom result columns that aren't mapped to a table vs when you want to do the default ORM thing and nest table mapped objects.
Imho the "smell test" for good-enough ORM should be: can I, at the same time, (1) just switch the DB from underneath, replacing a relational MySQL with a document-oriented DB like Mongo or a hybrid one like Arango and have it work the same, 99% same business logic, (2) replace the underlying DB with a REST API of another app and use that as the DB, and (3) have the ability to drown down to "raw db" language, whether that is SQL, Mongo JSON's based queries, or a GraphQL API I use to "prasitise" another app and use it a the DB for mine?
None of the ORM/ODMs out there come even close to satisfying half of these requirements... so I'd rather implement an app-specific one that does 50% of this for the 1% restricted subset of functionality of the app I have...
...well, call it an OWM, Object-to-Whatever-Mapper, but that's what you'd one anyway, document-oriented DB's don't map directly to your business objects either :)
I think tackling the most general problem would lead to a way more interesting solution but I have to time to work on this :P
I'm going to project my naivety here and ask how does one write raw SQL queries in code? Should said queries live in their own files and be referenced when needed or are the queries usually written where they are called?
I've worked on a few apps that relied heavily on SQL stored procedures as the data access layer. In a database like SQL Server or Sybase, a stored procedure is like a function with a body of SQL that is stored on the server, receives typed parameters and can return more than one result set, each result set having a different shape (different columns). So, a call to get an entire graph of data would only require one request and one response, saving many round trips. You could write a stored procedure that selects a customer, their recent orders, the products on each of the orders and so forth. You can also insert/update into multiple tables in one trip and you can wrap any parts of your SQL in transactions.
The SQL variant used by SQL Server and Sybase, called Transact SQL, also has elements of procedural programming such as variables, flow control with IF/THEN/ELSE/switches/etc, looping, calling other SQL procedures or user defined functions, exception handling and so on.
These features have been part of SQL Server since the 90s. It's extremely powerful and if used well, can be much more elegant and flexible than an ORM in my opinion. It's a shame that PostgreSQL doesn't support stored procedures quite as well or have an SQL variant that's as well crafted as Transact SQL because I'd love to be able to write apps in that style again. Somewhat recently I believe PG got something like stored procedures that can return multiple heterogeneous result sets and although I can't remember the specifics, I don't think it's quite as robust as what you can do with TSQL and it lacked client/driver side support maybe... (Also PG has some support for running TSQL itself, but not with all the same features.)
I've personally seen and done both. At one job (first actual dev job) it was a lot of complex business queries for analytics. I put these into their own python module as string "constants" and imported them (I would not do this again). At my current job we put them in .sql files and have a 3 line function that opens and reads them into a string. I sometimes write them inline. I've come to only one real conclusion though is that unless it's super trivial I put the SQL outside of where they are being invoked and name the .sql something that makes sense. It's just noise in my code and I want to see the logical steps.
I don't use an ORM, because they're not a good fit for my project (also, for the reasons in the article).
But I do have a pain point around migrations - I have to either painfully hand-write migrations for the deploy code to deal with, or hand-alter the production database to make it fit the new schema. Neither of which are good solutions.
Totally stoked to see the automigrate tool. It completely fits my pain point. Ideally I can have a line in my make file that compares the current schema with the last-known production schema and automagically generates a migration for it.
I'll have a play with this, and contribute if I can.
ORM is what makes you be able to move fast, otherwise find an alternative or use SQL if you have to.
It can make what you built slow if you don't know how to use, if it doesn't allow you to do optimize find an alternative or use SQL if you have to.
ps: I love ORMs and haven't find a better one than Django ORM when working with Django, everything seems integrated tons of flexibility and database specific features are included as well. Forms, Admin, Search, Sessions, Auth, REST Framework and many other things are well integrated with the ORM so things works smoothly as possible.
Another way of looking at the problem is that (in many naively-implemented backend codebases) almost every layer and component of the stack ends up depending on the ORM objects, which has the consequence of allowing any of those layers or components to do arbitrary database operations.
You might have some 'data access layer' class/module but if you're using an ORM, it more than likely accepts as arguments and returns as results some ORM objects. The ORM objects returned from the data access layer invariably have methods like `save()` or `delete()` or `validate()` or some magical property when called ends up issuing another query (think: `customer.orders` or `order.customer`). The objects you're supposed to pass to your data access layer as arguments probably also have constructors that, under the hood, grab static references to singletons (often the connection factory).
In this way, the data access layer is leaking the opportunity to (if not the responsibility of) invoking save/delete/validate functionality to any and every part of the code base, when it should really be contained in one module. Moreover, you cannot reason about whether a particular data object is persisted, could be persisted, or was persisted just from its type.
That is, your Customer object always exposes `save`/`delete`/`validate` even if you got it from a data access layer method called `getCustomersWithRecentOrders`. (Idem for the elements of `getCustomersWithRecentOrder()[0].orders`.) There's nothing stopping you from pulling out `getCustomersWithRecentOrder()[0]`, mutating it, then calling `save()` (or `validate`, or `delete`)—which never-ever makes sense.
Instead, `getCustomersWithRecentOrders` should return behavior-less structs, and your data access layer should provide an `updateCustomer` method. Moreover it should be impossible to construct the type that goes into `updateCustomer` with data that does not validate.
When you implement this kind of data access layer, maybe you use a low level DB driver, maybe you use an ORM for some niceties, but either way, you shouldn't leak types from these dependencies out to the consuming module. However I rarely see this kind of discipline in 'naive' ORM apps (rails/django/laravel/CoreData/Hibernate/etc).
I've been working on a simple ORM alternative for TypeScript + Postgres which addresses the first two complaints (takes SQL as source of truth, and doesn't take over migration or connection management), and am really happy with it so far: https://github.com/jawj/mostly-ormless
I never really saw the reason for having an ORM, why not just do your data logic in the database? Write procedures/functions/packages/triggers and handle everything there and it will almost always perform better than trying to do the same thing in application level code. Then have your application do simple SELECTs, INSERTs, and/or UPDATEs without the need for an ORM.
Well, if you need to do all three of these statements for a lot of tables there will be lots of code duplication. Therefore it would be good to create some metadata with things like column names and the corresponding types. As soon as you do that you have a baby ORM.
An application might be good as a display/interaction layer for an end user but not for performing complex data manipulation in a database, that's what SQL,procedural database languages, and ETL tools are for.
SQL as source of truth... Generating models from those SQL create table instructions sounds pretty sick actually. Nice one! Would use it in upcoming projects if there was broader support accross different languages and frameworks. I'd like to see some adoption by big players
I'm sorry, but unless this dude has some credentials for writing ORMs (which, not to toot my own horn, I personally have done) I don't think he even knows what he is talking about. Almost ever ORM I've worked on, we treated SQL as the source.
Why is it always assumed that you have to use one or the other in these types of articles?
Use the ORM where it makes sense. Use SQL when it makes sense. I know that Django's ORM lets you you run SQL through it and prevents SQL injection.
There is other way. ORM nor SQL is a source of truth. Database design (physical diagram) is the source of truth.
ORM code and SQL migrations are generated from the design.
New concept of the day: Object-Relational Mapping, converting an object, with all its members of diverse types, to a format suitable for storage in a database, roughly a table of scalars.
Sometimes ORMs are like trying to tie a fishing lure with mittens on. Frustrating.
On the other hand I can not stand, no, really really really cannot stand clever 1000+ line SQL queries. Writing an entire program in one dense line doesn't show how smart you are, it shows you don't have a clue about the bigger picture or else lack respect for the time and mental health of others.
For most cases on a team and for maintainability I've finally decided the advantages of a good ORM and migration tool where you can commit migrations in a rational and standard pattern far outweigh the disadvantages. But this is not because I don't know or respect SQL.
The ‘somehow’ is usually a half-baked migration tool
Maybe don’t use a half baked ORM.
I’m experiencing schadenfreude that someone is having ORM problems in 2019 long after the NoSQL hype died.
I can remember on zero hands the number of times my not half baked ORM was the source of my problems such that I wanted to say “fuck it lets just write SQL”. ORM patterns are pretty well laid. Hell they haven’t changed much in at least a decade or more and they work great for the problem they solve. Which makes me think maybe a relational store is not right for whatever you’re using it for.
tl;dr If you have ORM problems I feel bad for you son, I got 99 problems but SQL ain’t one.
I think that our goal as enterprise software developers is to automate business logic (business constraints) in a way that more-or-less a domain expert can read and understand. To create a precise language that facilitates fast and robust communication between experts and developers. New hire to the team should be able to learn domain expertise from the codebase because your codebase should contain easy to read domain model.
Relational algebra and SQL is not a very expressive natural language. SQL limits your vocabulary to 4-5 verbs unless you start writing procedural code in procedures etc. but SQL is not a good procedural language. Relational databases are a solution to specific technical problems of execution speed, atomicity, consistency, isolation, and durability (ACID). They excel at that, not at communicating intention.
You should use ORMs (preferably Data Mapper) if your goal is to solve the first problem while using off the shelf solution for the second problem. Data Mapper allows you to isolate your domain model from technical aspects of data storage. This will work very well as long as you will actually be able to ignore technical aspects in your domain model.
You can do the data mapping, querying, migrations, and all this technical cruft manually with a handwritten SQL if you want, but SQL certainly will not address very well the first goal of creating an expressive domain model that facilitates robust communication between developers and business experts.
And to address Active Record. It's a solution in the middle, you don't get full isolation, because database details creep into your model. The tight coupling of the domain model to ORM also makes it difficult to swap it for a handwritten mapper. But it's a very good tool for fast iteration and prototyping.
And to address the only point from the article that is not moot based on my explanation above.
"We can’t have nice things until we move our schemas out of our app languages and into declarative spec languages like SQL, proto, or even jsonschema / swagger (though I’m not a great fan of the last two)."
SQL, jsonschema / swagger can't express business constraints. I'm working for a university. How do you express that "a student internship in some company A can not fall in time between some two other internships if the two other are in some company B" in a declarative non-domain specific language and make it still readable? You can write a declarative domain-specific language, but I wish you luck on your adventure trough parsers, interpreters, planners etc. to create that declarative domain-specific language in the first place.
What this article misses IMO is a clear distinction of the various abstractions layers an ORM can provide, most importantly these two:
- Heavy ORMs: Object-oriented mapping layers that map classes to tables. A record is represented as an object that not only carries data but also implements various behaviours for storage, retrieval, serialization and deserialization of its own data, sometimes it also implements business/domain logic. Examples: ActiveRecord, EntityFramework, Hibernate, Sequelize, ...
- Lightweight query builders: Lightweight, programmatic abstractions on top of SQL. Examples: knex.js, pypika, ...
There are issues with both approaches.
Heavy ORMs
Heavy ORMs make it seem like a developer doesn't need to know SQL to work with a database. They help to get a project off the ground quickly, but as a project grows and requirements become more complex, they often generate slow queries or generally lack capabilities to express the right queries. After all, a developer will be forced to understand what's going on in the SQL layer underneath, spending a lot of time debugging and understanding the generated queries, often without the possibility to optimize them. The premise of "not knowing SQL" and productively working with database longterm has just been proven wrong in so many cases (I feel like everyone has been burnt by an ORM at least once in their dev career...).
Lightweight query builders
Lightweight query builders on the other hand don't try to abstract too much away from SQL. Their APIs typically use SQL terminology and they're not trying to hide the fact that they're working with a database. Coming from an application developers perspective, there still is the drawback that you must have a solid understanding of SQL to work with them efficiently. Another issue that these query builder APIs typically aren't type-safe. There's no static analysis that can help identify typos or other issues in the DB queries you're building. In terms of developer experience, they lack features like autocompletion for DB queries. Personally, I still prefer query builders over heave ORMs since I get more control.
Time for a plug: I work at Prisma where our mission is to make it easier for application developers to work with databases by providing developer-friendly, efficient and performant abstractions and tools for working with database! We're currently building the Prisma Framework [1] which consists of two main tools:
- Photon [2]: A lightweight, auto-generated and type-safe database client
- Lift [3]: A tool for declarative data modeling and database migrations
Both tools are based on the Prisma schema [4], a declarative abstraction over your database schema that serves as foundation for the generated Photon API. Lift migrations work by mapping the Prisma schema to your database schema. We also build a ton of other nice tools and libraries that simplify database workflows (such as Prisma Studio, a database GUI) or integrations with API layers, such as GraphQL Nexus [5] for building type-safe GraphQL APIs without boilerplate.
It makes it much easier to avoid errors and tab-complete relations across the object graph.
> Some ORMs offer to ‘mirror your live DB’ into a schema. That’s insane to me. It makes any type-checking or linting impossible. You have no idea what you’re getting.
Hm, I believe the author is talking about SQLAlchemy's sqlalchmey.ext.automap: https://docs.sqlalchemy.org/en/13/orm/extensions/automap.htm.... This is a relatively new feature in SQLAlchemy, no need to use it if you don't want (but it is pretty cool).
I can't speak for everyone here, but I doubt most ORM users are using it that way.
> Your test suite will be useless because your local DB probably is getting spun up from a schema.
Most likely a test database is spun up (if it's not a sqlite in-memory db). In the case of django it'll automatically prefix test_{databasename}, run migration files, and go to town.
> If not, then you have another hard problem of doing regular prod dumps to your dev environment. Yuck.
Which leads me to another benefit of ORM, adding fixture data to simulate very intricate states is much easier to do and maintain. Between tests, most likely, transactions are being rolled back.
Since you are using python, you'll be able to use pytest fixtures: https://docs.pytest.org/en/latest/fixture.html. I really like pytest-django, but I'm not sure how well the sqlalchemy equivalent is though. Maybe that's the area you could have having issues with, since when they have bugs, they're notoriously hard to figure out (is it how I'm bootstrapping sqlalchemy? is it my models? is it my fixtures? is it a pytest bug? or the pytest extension?)
> There’s always some surprise about the connection or pool type. It’s never straightforward, it’s always badly documented, it has twice as many layers as necessary because it has to manage bespoke session tracking and table registries. Transactions are undocumented or mysterious, autocommit is implied or hidden.
> We can’t have nice things until we move our schemas out of our app languages and into declarative spec languages like SQL, proto, or even jsonschema / swagger (though I’m not a great fan of the last two).
You could have your django (or whatever is using the ORM) server / etc. run an API. What are you building exactly? graphql will get you a typed API and have other languages use it via whatever you like (OAuth2?)
You're probably going to a main programming language of some sort doing the CRUD/migrations. Why? There's normally business logic / layer stuff on top, that's another +1 having other languages access via API may be used so there isn't duplication.
Are there cases where making API wouldn't work? If that's the case you're probably well beyond the ORM questions, and probably learned to just keep ORM to the basics - simple ForeignKey relations, staying away from polymorphism / stuff that doesn't map well to other languages where you may have to access SQL.
Imperative code is complicated when dealing with data, so we abstract imperative code into SQL.
SQL is complicated so we abstract it from SQL back into imperative code via an ORM.
Of course it's backwards.
It's like one of the biggest legacies of web development next to javascript, html and CSS. SQL is a very specific way to think about data and a very high level and leaky abstraction. So high and leaky, that you literally need to look at the query plan in order to optimize sql queries. If Database IO is the bottleneck for web development the best abstraction for this area is most likely a zero cost abstraction (rust for example). Instead we have SQL and we create a generation of SQL hackers who memorize a bunch of technology specific hacks to try to get things to work. Why is SELECT * bad? It's a hack you memorized, a leaky abstraction.
I'm not saying schema-less is better hear me out. I'm saying SQL is a bad API. You can probably make some api language that's more explicit and imperative on top of something like postgres. Change the API itself don't write something on top of it.
Either way, an ORM on top of SQL is like what type script is to javascript. You know when they make an API on top of an API it's to cover up something that doesn't work well. Unless it's a zero cost compilation such fixes only make performance worse.
If you insist on using some abstraction on top of sql the way to do it with zero cost is an api that is bijective to the syntax of SQL itself. Type script I'm guessing is relatively isomorphic to javascript, so it works out.
> I'm saying SQL is a bad API. You can probably make some api language that's more explicit and imperative on top of something like postgres. Change the API itself don't write something on top of it.
Something like VSAM or IMS?
The relational model and declarative query languages were invented because the previously existing imperative navigational model was far too inflexible and difficult to use. Access paths were baked into the data structures, you had to be your own optimizer, and if you optimized for the wrong usage patterns, you had to refactor everything or live with poor performance.
You can enforce patterns via types. You can also have default optimizations. The problem with SQL is to choose optimization you have to do it by poking the syntax.
Obviously the above example is trivial, wrong and probably won't work. But I hope it paints a fuzzy picture of my point. That is that the optimization or 'plan' should be explicit. You choose auto or you dive in and you change it yourself. No language or syntax hacks.
No thanks. The benefit is not having to specify the plan. The optimizers themselves are fairly good. A well designed table with the right indexes should perform well.
Raw SQL is great at what it does.
ORMs are typically good at what they do.
Depending on the use case both have their pain points. Typically I see a mixture of ORM for most CRUD based stuff and raw SQL for cases where the ORM isn't suitable.
In my example code. Do you notice the string "auto"?
It means you can literally have the API do exactly what SQL does. Automatically specify the query plan. But you explicitly do it.
Meaning that sometimes the auto planner screws up. When it screws up, in those cases you explicitly make it use another algorithm rather then modify SQL to be slightly different and hope that it compiles into something more reasonable.
I'm not convinced on the UX of that. Developers are lazy. I think what would more than likely up happening is they explicitly set it to "auto" and modify the query to get a more performant plan. After a certain query complexity It would become exceedingly hard to start to be able to piece together plans by hand.
A bunch of performance problems aren't even solved by tweaking the query either. They're solved by changing the database structure. Adding the appropriate indexes and such. Making sure datatypes match between joined data and no implicit conversions are happening. Right-sizing columns.
No amount of specifying your own query plan will do anything to affect those kinds of issues.
In practice I find mostly it comes down to cardinality estimate issues as a very common source of problems as the database either over or under provisions enough memory for the query. If it estimates it's going to get back a lot more data than it actually does and it grants too much memory that will reduce parallelism because that memory can't be used by other queries. If it under-estimates it doesn't grant enough memory and when it gets back more rows that will fit into memory it has to write them temporarily to disk taking a massive hit in I/O performance.
How does your scheme work with figuring out how much memory the database should grant to a query when specifying a plan by hand?
What's more is plans change over time as statistics change. Leaving it up to the query optimizer means it's adaptive. Having to specify yourself means you have to know the optimizer isn't giving you the best plan at design time. There are some cases where you know that. There are some cases where you don't.
You can already specify query hints etc. I think SQL just has this covered already. I have no qualms if the query changes slightly.
It seems that what you want is not an imperative API, but a more explicit one. I'm afraid the problem is that specifying the access methods or optimizations to use in the database is very product- or even version-specific, on the other hand you can already tell a database what to do using optimizer hints.
I want both. An imperative language makes it more able to be isomorphic to a web app api. You can have corresponding types and functions without something totally different.
Optimizer hints is a good idea. As far as I know this isn't a well known feature (if it even is a feature) for postgresql, which is the DB I have the most experience with.
Help this noob see the light: how do I handle super dynamic queries in an ORM?
E.g. I have nodes that can be listed and sorted in various ways. That might mean having to join and/or select on additional tables (taxonomies, tags, sources, auhtors). I have a lot of code to generate exactly the query I need for any given listing, but how do I do that without string concenation? Make a dedicated query for each permutation of additional tables I need or don't need?
I'm not so much looking for an explanation in your own words, but just a point on what to search for, or maybe some articles that skip right to such "advanced" stuff.
Most ORMs I have used are by definition a leaky abstraction: an inferior and insufficient representation for the underlying language at which they try to improve upon.
In addition, ORMs greatly increase the surface area for troubleshooting. Instead of just being able to run and profile a query, you now need someone who is experienced in the internals of the ORM and how it "plans" a certain query. Of course, because ORMs do not understand how the data is actually laid out, when they say "plan" they mean create the SQL string.
So in place of a query you will need to learn the conventions of the particular ORM, leaks and all, and how that maps to the SQL. Then you will still need to understand how that SQL actually interacts with the table schema.
The end result is normally one backend person who is very territorial over their code, and who acts as a bottleneck for the rest of the team. If you're using an ORM I'm sure you already know who this guy is...
So I disagree with most points.
The Django ORM does map pretty neatly to SQL (in my opinion), and especially Postgres. The migration tool is anything but "half-baked". And I have observed that beginners have an easier time creating Django models than dealing with straight SQL.
On top of that, the Django ORM mitigates most SQL injection attacks out of the box, without even requiring the developer to know about SQL injections in the first place.
And the ORM enables elegant code reuse by constructing query objects that can be used in Template logic, Form validation, generic CRUD, REST endpoints and more.
I agree that most ORMs don't help you with multiple programming languages. Though both the Django ORM and SQLAlchemy allow to customize a lot of the naming conventions and can be adapted to any legacy schema.
But then again: Is it even a good idea to access the same database from two codebases at all?
Some of the concerns about "ORMs kill the SQL-star" can be boiled down to a lack of a general understanding of relational databases in developers who learn to use an ORM before diving deeper into the philosophy of relational databases.