ORMs Are Backwards

bayesian_horse · on Sept 4, 2019

I have a lot of experience with the Django ORM, but no other.

So I disagree with most points.

The Django ORM does map pretty neatly to SQL (in my opinion), and especially Postgres. The migration tool is anything but "half-baked". And I have observed that beginners have an easier time creating Django models than dealing with straight SQL.

On top of that, the Django ORM mitigates most SQL injection attacks out of the box, without even requiring the developer to know about SQL injections in the first place.

And the ORM enables elegant code reuse by constructing query objects that can be used in Template logic, Form validation, generic CRUD, REST endpoints and more.

I agree that most ORMs don't help you with multiple programming languages. Though both the Django ORM and SQLAlchemy allow to customize a lot of the naming conventions and can be adapted to any legacy schema.

But then again: Is it even a good idea to access the same database from two codebases at all?

Some of the concerns about "ORMs kill the SQL-star" can be boiled down to a lack of a general understanding of relational databases in developers who learn to use an ORM before diving deeper into the philosophy of relational databases.

nnq · on Sept 4, 2019

...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it, but:

Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.

For, it's always been a huge footgun, coupled with Python's dynamic nature, it's and endless source of bug after bug... Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost. Use progres native json and array fields - good luck running your tests on sqlite, the ORM that was helpful up to here just gives up and tells you "f u buddy, you're on your own". Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...

While I like the idea of ORMs, I found them all totally "wrong headed" And SQLAlchemy - that's just not worth the effort of learning to use it properly despite its good ideas, if you're only going to end up using 10% of its functionality and in an idiosyncratic way that was not intended.

Fradow · on Sept 4, 2019

This thread is littered with people singing praise of Django's ORM. He is not the only one. I'm a huge proponent of it as well.

> Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.

The official documentation covers advanced use cases. Stackoverflow will help you shape up more complicated queries.

> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

That's a drawback of most dynamic languages, not a flaw of Django's ORM.

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

Maybe that's the threshold Django ORM devs thought sensible to stop supporting? At some point, mixing hand-coded-SQL and ORM code introduce a whole set of hard questions and decisions. Very few people expect ORM code to properly play with hand-coded-SQL. If you need to switch back to SQL, you are generally on your own.

> They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...

It's simple to use, yet have complex internals. That's about what I would expect from a tool abstracting something complicated.

yorwba · on Sept 4, 2019

>> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

> That's a drawback of most dynamic languages, not a flaw of Django's ORM.

In Python, mistyping a name usually results in an exception at runtime. If Django doesn't check models for correctness, it can't be explained by being written in a dynamic language.

Fradow · on Sept 4, 2019

In Python, mistyping a property of an object doesn't result in an exception. JS have the same behavior.

If you mistype a property on a nullabe or already filled property of a model instance, Django will happily save the model with the mistyped property (which will be ignored because it's not part of the model) without actually modifying anything in database.

Calling that behavior a flaw of Django's ORM is disingenuous.

shakna · on Sept 4, 2019

> In Python, mistyping a property of an object doesn't result in an exception.

Sure it does. It doesn't on assignment, as you can modify the object however you like, but it does have AttributeError for accessing a property that doesn't exist.

Brometheus · on Sept 4, 2019

Actually it does on asignments to `__slots__`-based classes.

djd20 · on Sept 4, 2019

Which you can only use compile time - not dynamically. That said I recall the sqlalchemy orm reject fields that it did not recognize. A decent IDE (pycharm) will hilight these for you on the Django orm. Having now used both the Django orm and SQLAlchemy extensively I prefer SQLAlchemy a lot. The Django ORM is usable, but many of its behaviours are odd and I agree that writing more complex queries gets ugly fast, unlike with SQLA where you can write some really hairy sql in a composable manner that makes it just so much more legible.

yorwba · on Sept 4, 2019

There is no such distinction between compile time and run time in Python; everything can be done dynamically. Want to use a class with slots computed at runtime? No problem:

  >>> def with_slots(**kwargs):
  ...   class SlottedClass(object):
  ...     __slots__ = kwargs.keys()
  ...     def __init__(self, **kwargs):
  ...       for k, v in kwargs.items():
  ...         setattr(self, k, v)
  ...   return SlottedClass(**kwargs)
  ... 
  >>> with_slots(foo=1, bar='qux').zork = 4
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  AttributeError: 'SlottedClass' object has no attribute 'zork'

Of course there are many improvements that could be made to this approach (e.g. not creating a new class for each instance, or setting the __name__ to something meaningful) but I hope it shows that dynamic typing and robust error checking aren't contradictory.

codetrotter · on Sept 4, 2019

They were talking about assignment.

bayesian_horse · on Sept 4, 2019

If you misspell the property when setting the value, there is no error.

Again, this can be caught by eye, by testing and probably soon by static type checkers.

_visgean · on Sept 4, 2019

i mean if you try to access undefined attribute in model it will throw an exception

andybak · on Sept 4, 2019

> ...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it,

No, he's not. Almost the entire Django community has a good opinion of it.

The main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy. They have different design philosophys and the distinction seems largely a matter of taste and differing use-cases.

nnq · on Sept 4, 2019

> main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy

...then I'm in the minority of hating them both, and in the process of looking for an alternative since I'd be using a lot of python too for the foreseable future :|

I liked web.py's database abstraction layer a long time ago - know of anything similar in philosophy but more powerful?

snorkasaurus · on Sept 5, 2019

Massimo still maintans it at https://github.com/web2py/pydal and he's working on py4web at https://github.com/web2py/py4web.

DAL was the first Python db abstraction I used, and I liked it well enough, but I've since used both Django and SQLAlchemy, and they both seem less opaque to me, maybe because not obscured by some of the weird things about web2py itself.

I love the auto-generated backwards references you get in Django and SA when you use foreign key and many to many fields. They make life so much easier. I don't recall how that was achieved in DAL, I just remember it being... complicated.

babayega2 · on Sept 4, 2019

Years ago I had the same discussion [0] on Medium and here is my take on that: You will end up reinventing "an ORM" because your SQL won't handle all the edges cases. At least with eg. Django ORM we know it helps avoid some security flaws.

[0]: https://medium.com/@iMitwe/been-working-with-django-for-over...

nnq · on Sept 4, 2019

thanks for the reference, especially to the Doordash's original article "Tips for Building High-Quality Django Apps at Scale" that was related to... I really like their thinking and think they are solving for the same problems, really like their idea of abstracting/wrapping over the ORM instead of using signals, and I'm looking for a similar architecture for a "default" Django boilerplate, featuring:

- data models in separate files one for each model, under a data_models/

- business logic models that may or may not map one to one with data models under a models/ that would content roughly framework-independent code

- encourage "fat apps"

- figure out some way to let the db be the source of truth and allow "multiple services, one DB" scenarios - maybe having "core models" for which all db migrations are excluded, table names fixe, and possibly extra columns that the django app could ignore but other services could use, and "app/project models" for which regular django migrations would run

I might write it up after I get the boilerplate app, and I'm really interested in hearing opinions for people with opposing viewpoints especially!

rubinelli · on Sept 4, 2019

Migrations and integration databases don't mix. One works in a code-first paradigm, the other is database-first.

nnq · on Sept 4, 2019

...of course they don't: I was referring to having a sane standardized way of having each app work with two dbs, one that it owns and includes some models of no interest to other apps, a different one that it shares with other apps and is used by other models (and has migrations turned off for these models, ofc).

The main point is that it can result way less code, think of all the APIs between microservices that would never need to be developed or documented or maintained because they'll never exist, yet the problem they would've solved will be addressed :)

Doesn't work all the time, but if most of what you need is data-sharing, and there's a clear model of "this service produces this data, which has a clear documented fixed format, all others just consume it", you'll prevent thousands of LOCs from being written. And I do see my mission more and more as preventing (useless) code from being written nowadays...

bayesian_horse · on Sept 5, 2019

In Django we can have multiple databases using multiple backends and either the same, different, or overlapping ORM models.

babayega2 · on Sept 6, 2019

Indeed. I recently worked on a legacy project where I don't have the luxury to modify the old database. I ended up using Peewee [0] to access the old database, in conjunction with the normal Django based database and even wrote a simple model that use Peewee [1]. Everything works, even Django Rest Framework associated with the model.

[0]: http://docs.peewee-orm.com/en/latest/ [1]: https://stackoverflow.com/questions/57602206/mysql-gone-away...

collyw · on Sept 4, 2019

Django's ORM is fine. I would say that it's good but not particularly powerful.

Its easy to use and makes inserting / updating data fast and easy. It's simple and fast for simple queries. It starts getting a bit crap when you need complex queries. Annotations / GROUP BY stuff isn't intuitive.

It does make it easy for inexperienced people to generate huge numbers of queries by looping through querysets and getting field from another table.

It lets you run raw SQL through it and prevents SQL injection, so its really a case of knowing when to switch to that.

sjwright · on Sept 4, 2019

If you're building SQL with string concatenation of data, you've lost. You're not secure. Give up.

Parameterised queries are straightforward in all competent languages. If Python can't do it natively, switch to a better language that can.

collyw · on Sept 4, 2019

Where did I say I was concatenating data? You might want to take a look at Django's ORM before making comments like that, it handles parametrised queries fine.

sjwright · on Sept 4, 2019

The way I see it, an ORM is just a way of formalising the false assumption that a database is stupid and useless.

I've taken a look at a couple of Django's most popular ORM options and I suppose if I was a mid-level corporate cog writing CRUD interfaces for Stack Overflow tutorial-grade operations like customer-sale-item-product relationships it would probably work fine. I've only had passing exposure to that kind of soul-crushing drudge work.

For the way I work an ORM is baby stuff, and I'm not even doing anything particularly advanced or sophisticated. I maintain (among other things) a relatively unremarkable bespoke online discussion forum. It comprises hundreds of queries, few of which could be composed correctly by an ORM, let alone composed and run performantly. The median complexity query in my code base probably has two or three joins and two or three subqueries—at least one of which has some kind of tangential aggregation or window function.

collyw · on Sept 4, 2019

> For the way I work an ORM is baby stuff

https://www.reddit.com/r/iamverysmart/

> a relatively unremarkable bespoke online discussion forum.

>It comprises has hundreds of queries

Sounds like you have a design issue. Making the unremarkable complex.

sjwright · on Sept 4, 2019

Thanks for the well deserved lesson in internet arguments. You win, sir.

(For the record, complexity and remarkability aren't strong correlates. And it's not particularly complex; probably 90% of the app's queries are written for indirect tasks like moderation tools, automations, statistical reports and various stratum of anti-abuse mechanisms. Most common pages only have one security-related query and one content-related query.)

satyanash · on Sept 4, 2019

> progres native json and array fields - good luck running your tests on sqlite

you're not testing what you actually run in production, then what are you testing? If you just need to test in-mem behaviour, mock the DB completely; instead of expecting compatibility for vendor specific features across vendors.

candu · on Sept 4, 2019

For database testing, I often use a "ramdisk", or in-memory filesystem: sudo mount -t ramfs -o size=512m ramfs /path/to/mount/point, initdb, pg_ctl start. (There's a bit more bookkeeping to generate passwords, store them to /path/to/mount/point/.pgpass, etc.) You can tear it down afterwards with pg_ctl stop, sudo umount /path/to/mount/point. I'll put the necessary commands in start / stop scripts, then call those from within the beforeAll / afterAll hooks of whatever testing framework I'm using.

It's definitely slower than mocking, and therefore not really suitable for unit tests - but it's great for repeatable, isolated integration tests (continuous or otherwise).

bayesian_horse · on Sept 5, 2019

Sounds like a job for docker or CI systems.

nnq · on Sept 4, 2019

Hmm... can one test use in-mem db mocking with django.contrib.postgres fields?

I'll definitely look into this, not a solution for existing projects bc of legacy baggage, but probably the right way to do it. Problem in practice is that you yes what you really want is to mock the db completely, but practically speaking existing code is to entangled with the db adapter specifics, so you end up with running with a "fake real db" :|

Scarblac · on Sept 4, 2019

Can you write the tests so that you create object instances and not save them to a database at all? You don't need to test the ORM and database, after all, just your code.

collyw · on Sept 4, 2019

The closer you keep things to production, the more likely you are to catch errors that will occur on production.

nnq · on Sept 4, 2019

unit tests, yeah, but if you try to get "the most bang for the buck" in like "test a lot with zero/little effort" you'll end up with a layer of "integration tests": that (1) set stuff in db, (2) do stuff, (3) read stuff from db and assert its correctness... A sane ORM would not actively hinder this scenario even cross-dbs or with memory-mocked-dbs, but Django's does!

djd20 · on Sept 4, 2019

I don't see the issue - you're asking it to support a feature of one database on another, something it never claims to do. You need to use the same database for testing that you use in production.

bayesian_horse · on Sept 4, 2019

I'd say the sqlite backend and using it in testing is mostly useful for reusable django apps that are supposed to be used in a wide variety of projects.

Most of the code I write in Django would never need to be SQLite-compatible, even if a lot of it would be. The SQLite compatibility is almost like a relict from a time when setting up RDBMSs was difficult and slow.

Scarblac · on Sept 4, 2019

> Most of the code I write in Django would never need to be SQLite-compatible,

Then don't run your unit tests against SQLite, run them against Postgres.

kennydude · on Sept 4, 2019

If so, don't use things specifically in the Postgres contrib package of Django? The package is built that way for a reason

bayesian_horse · on Sept 4, 2019

I don't know if there is already some plugin to make mypy aware of Django models, but that should be possible and would avoid bugs where you misspelled a property and it slipped through your quality control and unit-test.

I suspect "raw sql" to be a bigger headache and source of bugs in any moderately complex project.

Most of the use-cases of queries are actually very simple, and there the ORM allows you to simplify the code a lot.

My hints how to enjoy the ORM more: Write custom managers and QuerySets (for example for commonly used filters), write generic views that take QuerySets, take advantage of Models and QuerySet in Form Validation and REST APIs.

If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...

sjwright · on Sept 4, 2019

> Most of the use-cases of queries are actually very simple

I write web applications / data tools with raw SQL and I find that only about a quarter of the queries are simple. Many have subtle specificity to the joins, subqueries and window functions that return for me almost exactly what I want to output.

The problem I see in most people's ORM-powered code is that they lean so heavily on application code to get all the data assembled in the right place and in the right order, resulting in scaling problems

> concatenate raw SQL

Often dangerous and almost never needed. Parameterised SQL is the right approach.

Izkata · on Sept 4, 2019

> If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...

Depends on what exactly you mean here. The Django ORM fully supports some things you wouldn't initially expect if you hadn't seen it before, like passing one queryset into the filter args of another. It'll actually compose the queries and run it once on the database, instead of running the first and passing the result into the second.

kennydude · on Sept 4, 2019

> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

Test your code. Use an IDE. PyCharm highlights these errors immediately.

> Use progres native json and array fields - good luck running your tests on sqlite,

You're using stuff from the Postgres contrib package on a different db. It's pretty clear in the docs about that

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code

Make a view and manage it through the ORM? If you're updating data from this, then what you're doing seems really strange.

Izkata · on Sept 4, 2019

I'll leave the other parts since sibling comments responded to those, but:

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

It's been built-in since at least 1.7, and I think I remember it since 1.4: https://docs.djangoproject.com/en/2.2/topics/db/sql/

ddalex · on Sept 4, 2019

I am the other person with an excellent opinion of Django's ORM. I vastly prefer it to writing SQL queries manually, and the only thing I miss is making it ever more functional-like.

nnq · on Sept 4, 2019

> prefer it to writing SQL queries manually

Everybody prefers anything that does a barely decent job at preventing you from writing SQL manually :)

If you set the bar that low then sure, you'll have an "excellent opinion" of ANY ORM. Can you give an example of an ORM you didn't prefer to writing SQL manually for 50% of the cases? - most that were that bad are probably not in usage anymore...

notyourloops · on Sept 4, 2019

I'll chime in as someone who has used Django and its ORM for about 5 years and I have to say it's a pleasure to use.

hugi · on Sept 4, 2019

My experience as well. I use an ORM that I absolutely love for most work (Apache Cayenne) and I use plain SQL where appropriate, like for complex reporting.

The "ORMs are bad" articles usually boil down to "My ORM is bad" or "I tried to use an ORM where it doesn't really fit and now I believe all ORMs are bad".

jannes · on Sept 4, 2019

Even for "complex reporting" you can still use ORMs if you save your SQL query as a VIEW in the database. Then you use ORM for reading the output.

Izkata · on Sept 4, 2019

Or just use the ORM directly with a raw query, on a model that represents the results: https://docs.djangoproject.com/en/2.2/topics/db/sql/

hugi · on Sept 4, 2019

That's exactly what I do with Apache Cayenne.

tus88 · on Sept 4, 2019

> Some of the concerns about "ORMs kill the SQL-star"

This doesn't even make sense. It kills the SQL duds, stars are even more important to understand the convoluted SQL when the ORM isn't working as expected, and cases where the ORM can't handle at all (there are always some).

Scarblac · on Sept 4, 2019

> there are always some

That's not my experience. In ~20 projects in this company, with pretty complicated GIS-related web apps, there have been about 4 cases where I need to create a raw SQL query because Django's ORM didn't do what I needed. Almost all the time, it just works.

Twisell · on Sept 4, 2019

When you say "pretty complicated GIS-related web apps" does it mean you perform powerful spatial analysis using PostgGIS extension over a few millions spatial objects OR is it more about fetching a handful of spatial objects for display/processing in web-mapping framework?

Because for pretty obvious reasons in the first case you'd want computation on the back-end. Sure you can do a lot of spatial operations with python after passing through an ORM... but is this really more adapted than raw SQL & PostGIS? Genuinely curious.

Scarblac · on Sept 4, 2019

Most of the processing happens outside PostGIS, we often compute things with geometries from PostGIS to raster data that we don't keep in PostGIS. So our Postgis queries are mostly limited to filtering on a BBOX, or closest distance to a point.

What users want to compute can be defined by users, e.g., "take all houses in the Netherlands as polygons, compare with a DEM of the Netherlands to find the lowest point of the house's contour, compare that point to a water depth raster to see if it would be inundated" is a computation a user could define in our GUI and the actual computation would then run in (Python) background tasks.

Twisell · on Sept 4, 2019

Thanks for your reply.

I work in GIS field and I tend to put a lot of business logic into PostgreSQL because I can re-use all functions/view in a variety of tool from Desktop GIS, BI and even (shamefully) Excel for some our oldest App.

Whenever we'll move to a new webmap or I we'd want to serve an API it's always one connector away. All business logic put into the DB is heavily reusable because a lot of tool "speak" SQL.

I guess in the end it really depend on wether you approach is data centric vs functionality centric.

nsomaru · on Sept 4, 2019

Wow, where do you work?

pmontra · on Sept 4, 2019

Coming from Rails I think that Django's ORM is too complicated, like almost everything in Django and in Python. However I'm using Python, Ruby and Elixir in different projects and Ecto is much more complicated. It makes me want to write plain SQL as I did in the early 90s.

Actually I'd be OK writing SQL. It's just inconvenient having to decode the results and handle migrations manually. I did plain SQL migrations before ORMs. That's not something I want to be back to.

bayesian_horse · on Sept 4, 2019

Django's ORM is pretty complicated under the hood, but I suspect that this is not easily avoidable given the requirements.

I also don't subscribe to the view that Django is too complicated. With other frameworks you end up implementing part of Django, and most of the time that will be worse in every conceivable way.

andy_ppp · on Sept 4, 2019

Ecto is just a dsl for writing SQL queries, not an ORM. It’s difficult to understand if you think it’s an ORM...

udfalkso · on Sept 4, 2019

This. It's not harder, it's different. Takes some getting used to, but then you realize "oh it's pretty much just sql that returns structs" and you instantly hate every other ORM you've used prior.

neya · on Sept 4, 2019

I'm not surprised about your comment on Ecto. Ecto takes experience and practice, but by the time you finish writing your 5th or 6th application, Ecto will be so natural that you wouldn't want to go back. Especially not to writing vanilla SQL.

pmontra · on Sept 4, 2019

The problem with complicate things is that the second application never comes. I've got a customer that's been paying me to write a Phoenix application for the last two years. That's good but I'm not going to use Ecto in one of my own projects. So no Elixir if it touches a database. There are simpler alternatives to Ecto, maybe I'll use one of them but Ruby and ActiveRecord are good almost for everything.

brandonmenc · on Sept 4, 2019

> your 5th or 6th application

s/application/day/

jialutu · on Sept 4, 2019

I used to really dislike django ORMs since I had a background in SQL and thought that ORMs are really constraining and not as flexible as just writing SQLs. However, recently I have come into the realisation that not everyone who programs knows SQL (woah, what a revelation right?), and the django ORM is a really nice tool to fill that gap for them with something they are more familiar with (ie python objects).

bayesian_horse · on Sept 4, 2019

If you are using any language other than SQL, then you better have a way to convert SQL data to the appropriate data types of your language and vice versa. And it better be injection-proof and reusable. That's already at least half of an ORM right there.

I think I know SQL fairly well, but I don't see how I could implement things like django admin, form validation, template views etc as elegantly on top of plain SQL.

So I really don't agree with "ORMs are only useful for SQL-noobs".

hyperpallium · on Sept 4, 2019

> But then again: Is it even a good idea to access the same database from two codebases at all?

A motivation for relational algebra was different applications accessing data in different ways. This integration remains a selling point. Further, data often outlasts applications... even languages.

bayesian_horse · on Sept 5, 2019

Still, I think accessing a single database in RDBMs like Postgres or MySQL from multiple codebases or even languages means asking for trouble.

Providing some kind of API as a library, REST, RPC or whatever would almost always be preferable, if only to centralize permissions and validation.

_visgean · on Sept 4, 2019

the problem is that these days you need a lot of application code to check for permissions and that kind of stuff.

dllthomas · on Sept 4, 2019

In principle permissions can go in the DB.

bayesian_horse · on Sept 5, 2019

In practice this is rarely ever done (right).

Row-level permissions are possible, but it can get more complicated when certain users are only allowed to change a certain row in a certain kind of way but not in another.

dllthomas · on Sept 5, 2019

I agree, but it does seem to become more compelling if you're doing your integration in the database.

iamnotacrook · on Sept 4, 2019

"Is it even a good idea to access the same database from two codebases at all?"

Yeah, if you're migrating code between languages and platforms and want to run both simultaneously so you can go live with a subset in the new platform and phase out the legacy system in stages.

bayesian_horse · on Sept 5, 2019

In that scenario there is no benefit to raw SQL because you might as well write the new Application with another/better ORM?

iamnotacrook · on Sept 5, 2019

No rewriting. Program must be migrated as-is with as little manual change as possible. If the customer wants to make changes later that's up to them.

vonseel · on Sept 4, 2019

But then again: Is it even a good idea to access the same database from two codebases at all?

I think that depends entirely on who you ask. Some orgs may have dedicated DB guys who are designing all the DB schema and stored procedures that must be vetted by the DB guys before the application guys can merge in changes. Others might decide dev teams should be responsible for their own data and store it as they see fit. I think product data should only be updated by one main API and that API exposed internally for other tools to interact with, unless it's something that is special. And in development, there are many special cases.

bayesian_horse · on Sept 5, 2019

Microservices communicating via an RDBMS sounds like a recipe for chaos.

oliwarner · on Sept 4, 2019

Came here to say this, specifically about Django too. It's pretty bulletproof these days.

nablaone · on Sept 4, 2019

> But then again: Is it even a good idea to access the same database from two codebases at all?

It's an integration pattern. It works in some contexts.

ants_a · on Sept 4, 2019

If its treated correctly as a set of SQL language API-s with clear ownership and explicitly defined public/private parts. There is a reason why its often considered an anti-pattern, and it is a general lack of discipline in maintaining stable semantics and separation of responsibilities.

dariosalvi78 · on Sept 4, 2019

why using an SQL DB if you don't know SQL?

bayesian_horse · on Sept 4, 2019

Because web frameworks, mostly.

DevKoala · on Sept 4, 2019

It looks like the author is generalizing shortcoming in SQLAlchemy as faults with the concept of ORMs. ActiveRecord for example, treats SQL as “the source of truth”, per his definition.

Also, there is a whole section called "ORMs take over connection management and migration" and not a single point as to why connection management handled by the ORM is bad, only comments on migrations. Moreover, migrations are not an unsolved problem at all.

In Ruby, ActiveRecord solves all scenarios I have encountered gracefully, and trust me, I have dealt with tons of edge case projects.

For Python, I recommend Alembic: https://pypi.org/project/alembic/

zzzeek · on Sept 4, 2019

> ActiveRecord for example, treats SQL as “the source of truth”, per his definition.

SQLAlchemy author here. I can't imagine how someone would not consider SQLAlchemy to treat "SQL" as the "source of truth" as well. All of SQLAlchemy's constructs map directly to SQL syntactically. Our most grumpy users are the ones who don't know SQL. This blog post would certainly benefit from some actual examples. "ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?

taffer · on Sept 4, 2019

What the blog probably means by "SQL as a source of truth" is that you first design your database and then generate the domain classes from it and not vice versa. This is for example what myBatis and JOOQ do.

deathanatos · on Sept 4, 2019

I don't think there is enough information from the database schema to feed an ORM reliably. E.g., take a "boolean" in MySQL: MySQL doesn't have proper booleans, it just represents them as a `tinyint`. But you'd really like your ORM to convert those to your language's `bool`!

There is certainly a lot of information in a schema, and you definitely want your ORM's idea of the database and the database's idea of it to be in close alignment, but I feel like any serious attempt would find all sorts of holes like the above when it actually came to doing it.

setr · on Sept 4, 2019

Wouldn’t you just offer user-modifiable translators for such cases (presumably with some sane default)? It seems more preferable than writing the domain model in code first, resulting in N copies of the schema definition for N applications utilizing the db..

skissane · on Sept 4, 2019

> N copies of the schema definition for N applications utilizing the db..

Is multiple applications talking to a shared DB schema a common practice, especially nowadays?

In my mind, each app/service should have a DB schema which only it talks to, and other apps/services needing data in that DB go through services exposed by that app/service (REST, RPC, GraphQL, whatever) rather than talking to its DB directly. That means you can rearchitect the DB schema, change which DB you use completely, etc., and only the app/service which owns that DB needs to be modified.

jacobush · on Sept 4, 2019

Yes, it is common practice just about in every organically grown system, ever. Also anything corporate.

falcolas · on Sept 4, 2019

I've done this repeatedly with SQLAlchemy. It does require that I do a bit of additional record keeping in my models, but it's not exactly hard to do.

jayd16 · on Sept 4, 2019

Why use an ORM when I have to know SQL and its a direct syntactical mapping?

>"ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?

Yeah...its a light ORM that focuses on turning a result set into an object but not the syntax remapping of queries. Object mapping is almost universally liked but ORMs usually include query syntax mappings and not the addition of a transaction lifecycle into your data objects.

zzzeek · on Sept 4, 2019

because an ORM has nothing to do with writing your SQL for you. You can use textual SQL with an ORM and an ORM like SQLAlchemy has a query language that mirrors the structure of SQL in any case. nobody is taking your SQL away. Additionally, you most certainly do want a tool that will write most of your DML for you, there is no deep wisdom or joy in writing the same INSERT / UPDATE statement over and over again.

cozos · on Sept 4, 2019

I've used SQLAlchemy a bunch and found that it has excellent support for complex (for me anyways) query options. For example, avoiding N+1 selects with eager joins has been a breeze.

Diggsey · on Sept 4, 2019

ActiveRecord does not treat SQL as "the source of truth". ActiveRecord objects maintain their own state, and still try to be a "proxy" for rows in the database. When you call "save", it's tracking the identity of the object using the primary key, and generating SQL from the runtime state of the object. This is backwards.

The biggest problem for me is this "proxy" behaviour. When you start treating objects as proxies for database rows, your database logic inevitably leaks into your business logic. Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.

On top of that you suffer terrible performance problems. Even if you're smart enough to avoid the whole N+1 query issue, you lose control (and more importantly visibility) over where database updates are coming from, and your ORM is either not smart enough to efficiently apply updates, or so smart that building the query takes longer than actually running it! (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.

And it all serves no purpose! The application-specific interface to its database is usually quite simple. You can just have it be a literal interface/trait/whatever and implement each method by executing a bit of SQL, or calling a stored procedure, or however you like. Arguments and return values should be plain-old-data (no magic proxy objects) and each method should correspond roughly to a transaction so that the rest of the application doesn't have to worry about that database-specific stuff.

There are a ton of other benefits to doing stuff this way, too many to list here, but it really helps with complex migrations (like if you need to migrate from one database to another), maintainability, testing, etc.

zzzeek · on Sept 4, 2019

> (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.

for the write? writes are not usually much of a performance issue except when people are inserting thousands of rows. You can replace flush with direct INSERT/UPDATE or use the bulk API. Pre-setting primary key values will also improve performance by an order of magnitude. Share some of your code and profile results on the mailing list and we can look into speeding it up.

SQLAlchemy always provides many techniques to optimize areas that have performance problems, not to mention the primary direction for most major releases is that of new performance features and improvements year after year, despite having to deal with the dog-slow cPython interpreter. No "total rewrite" of your application should be needed.

Diggsey · on Sept 4, 2019

The SQLAlchemy ORM is a very complex beast. It provides plenty of methods to optimise the queries it generates. It provides no way to "magically speed up the python code".

This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.

The only way to get the performance improvements we needed was to stop using it as an ORM altogether. However, the ORM was too intricately tied with our business logic to do that without a near total rewrite.

What we've actually ended up doing to do this rewrite more incrementally is to create a Rust service which implements just the "GET" end-points: this gives us an order of magnitude performance improvement despite it using the exact same database and fetching the exact same data. Most of the business logic did not need to be duplicated as it is not used from "GET" requests, but users of our application will get much snappier loading times, and it will generally feel much better.

zzzeek · on Sept 4, 2019

> It provides no way to "magically speed up the python code".

it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.

> This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.

that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.

Here's a runsnakerun I make some years ago of the PyMYSQL driver loading data over a network-connected database: https://techspot.zzzeek.org/files/2015/pymysql_runsnake_netw... Notice how just the Python driver spends TWO THIRDS of the time an the actual waiting for the database 1/3rd ? that's Python. If the database is on localhost, that pink box shrinks to nothing and all of the time is spent in Python, just reading strings and building tuples. It's a very slow language.

Rust language however is blazingly fast and is one of the fastest languages you can use short of straight C code. This is not the fault of an ORM, this is just the reality of Python. You certainly wouldn't think that if you used an ORM that runs under Rust, it would be as slow as your Python application again. It would continue to be extremely fast. The Hibernate ORM is an enormous beast but runs immensely faster than SQLAlchemy because it's running on the JVM. These are platform comparisons. SQLAlchemy is really fast for the level of automation it provides under pure Python. You made the right choice rewriting in a fast compiled language for your performance critical features but that has little to do with whether or not you use an ORM. As long as your database code was in your Python application, you'd end up writing an ORM of your own in any case, it would just be vastly more code to maintain.

Diggsey · on Sept 4, 2019

> it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.

Most of those techniques boil down to "use SQLAlchemy as a query builder rather than an ORM" - something I wholeheartedly agree with and would love to do, but it is next to impossible because these ORM objects have leaked into the business logic of the application.

> that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.

I'm not trying to say SQLAlchemy is a bad implementation - it's a great implementation with tons of powerful features. I just believe that using anything as an ORM is a bad idea and directly leads to these performance and maintainability issues. Whether that's ActiveRecord, SQLAlchemy or even if someone implemented a similar ORM in Rust.

Some of the performance issues are due to python, but that's not the whole story: if I use SQLAlchemy as a query builder, and only return plain-old-data, I do not see nearly the same performance issues. Not because SQLAlchemy ORM is badly written, just because it has to do more work, create more objects, with more "magic properties", maintain more entries in the session, etc. etc.

wavesplash · on Sept 4, 2019

While you're not technically wrong at the SQL level, I believe there's a higher level of business problem that ORMs like ActiveRecord solve for that's being overlooked. (aka ActiveRecord got Airbnb and Shopify a long way in scaling both developers and product).

CPU time building a query is rarely the business bottleneck until you're at a huge scale - scaling up the engineering team uniformly and having business logic at abstraction level closer to the rapidly evolving Product specs is the bottleneck that ActiveRecord solves for, and does pretty well.

If you'd like something that avoids object/hidden state - check out how Elixir/Phoenix's Ecto[1] was designed - it avoids many of the shortcomings of ActiveRecord and SLQAlchemy:

1: https://hexdocs.pm/ecto/Ecto.html

kangoo1707 · on Sept 4, 2019

+1 for Ecto, the most powerful ORM that I have seen so far (among PHP, Node, Ruby, Python and Elixir)

jrochkind1 · on Sept 4, 2019

If "SQL is the source of truth" means you have no in-memory state, I'm not sure that's feasible, with or without an ORM.

Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later. That you will probably re-use for more than one access (if you need it more than once), because issuing the same SQL again the second time you need to reference any data returned, in a given routine, would be insane.

derefr · on Sept 4, 2019

I think what they’re proposing is something like “if I want to read and then update a value, incrementing it by one, the proper way to do that is to tell the DB to increment whatever value it has stored; not to increment my local representation of the value and then tell the DB to overwrite the value with it.”

jrochkind1 · on Sept 4, 2019

The number of times I've had to increment a value by one are minimal. Or similar things where you want to update one value based on the other values, and do it in an atomic way.

But if I had to increment specifically, and were using ActiveRecord... I'd use the ActiveRecord increment! method. Which has the proper db-atomicity semantics you want, it does execute `UPDATE x SET n = n+1 WHERE pk = y ...`

Scarblac · on Sept 4, 2019

ORMs do that fine, for instance it'd be

    MyModel.objects.filter(pk=...).update(myfield=F('myfield') + 1)

In Django's ORM.

rubber_duck · on Sept 4, 2019

This is not obvious unless you think about it - I've seen issues like this because the code looks correct if you ignore the update semantics, especially when it wont fail on your local dev env because in memory state will be the same as DB state since you're the only one hitting it...

Having to think about these things because the abstraction is leaky instead of just using the SQL where the transformation is explicit is not worth the hassle.

Using Clojure and yesql was a real pleasure for this - Clojure is immutable so you're working with values all over the place - and working with SQL query is just passing values in it and getting values out - no magic mapping - it's just values - if you want the latest value - query again, if you want to store a new value - send the new value. No mutation, no magic, just data.

jayd16 · on Sept 4, 2019

So obvious and simple! ...or you could just use SQL.

ben509 · on Sept 4, 2019

> Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later.

Nope. The DBMS is making the same transaction isolation guarantees to an ORM as to any other client.

I think the larger "source of truth" issue is how the schema is represented.

All of these systems that provide their set of schema objects that are then pushed to the database. This works fine for LAMP stacks, but as soon as you grow it becomes apparent that you've put the horse before the cart.

joevandyk · on Sept 4, 2019

I’ve found that you want to be super careful in application code about where you do updates and inserts when using AR.

Sprinkling calls to order.save! or order.update(foo: bar) in random places is a recipe for disaster.

jayd16 · on Sept 4, 2019

>Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.

Indeed. Its a rookie mistake but hey, you work with rookies all the time! Orms can be useful but they certainly allow for some dangerous patterns that are hard to notice if you're a young developer.

afcapel · on Sept 4, 2019

ORMHate is nothing new under the sun. Martin Fowler wrote a more balanced article about the issue years ago https://www.martinfowler.com/bliki/OrmHate.html

vinceguidry · on Sept 4, 2019

They mention the case where the ORM shadows the db schema. The argument isn't terribly salient, it's not like you can't build type checking in, it just means more work if that's what you want to do. Essentially according to the author the only way to do it right if you have an existing db is to write all queries from scratch.

My preferred solution for existing DBs is Ruby's Sequel or ROM if you want to go nuts. But I'm a diehard Rails lover and will choose ActiveRecord over all other solutions if I can. To me there's only one decent ORM and everything else is either good for special purposes or junk.

mumblemumble · on Sept 4, 2019

So, I've seen this go back and forth so many times, and I've come to the conclusion that the two extremes are irreconcilable. Not because they're incorrigible, but because they represent two very different ways of using a database.

For some, the database is not actually a database, per se, it's just a place to persist state. Usually folks in this camp think in an object-oriented way. They haven't read Codd's paper, they don't care about the relational algebra, they just want some way to take bags of properties and stick them somewhere that offers random access with good performance and is reasonably unlikely to not spontaneously erase data. If you're in this camp, ORM works great, because there isn't really much of an object/relational impedance situation in the first place. The data store wasn't structured in a way that creates one, there's no need burn a bunch of effort on doing so, and dragging in the whole relational way of doing things is like dragging a wooden rowboat on a hike through the forest because you'd like to use it to go fishing on the off chance you find a nice pond.

Others see a lot of value in the relational model. They're willing to put a bunch of time and effort into structuring the data and organizing the indexes in ways that will allow you to answer complex, possibly unanticipated questions quickly and efficiently. They, too, hate the syntax for recursive common table expressions, but are willing to hold their nose and use them anyway, for various reasons, but mostly because people think you're really cool when you can spend 30 minutes and make something go 2-3 orders of mangnitude faster than it used to. They don't think of the data in terms of bags of properties, they think of it in terms of tuples and relations and a calculus for rearranging them in interesting and useful ways. For them, there is potentially a huge problem with object/relational impedance; the data's organized in ways that just don't fit cleanly into Beans.

The thing is, neither of these ways of using a database is wrong. Each has it strengths and weaknesses. The trick is figuring out which way fits your business needs. Well, that's the easy trick. The hard trick comes when someone on your team doesn't understand this, believes there is one universal solution that will work for everyone, and is hellbent on jamming the One Righteous and Holy Peg, which happens to be square, into a round hole.

Scarblac · on Sept 4, 2019

I know about the relational model and used to work with raw SQL for years.

But this is the stack I often work with now:

- Postgres

- Django ORM

- Django REST Framework

- Auto-generated OpenAPI Typescript functions

- A Redux store

- My React app

In other words, four ORM-like non-relational frameworks stacked on top of each other between Postgres and my app.

A generic solution for "complex, possibly unanticipated questions" would need to work through all of those layers.

sul4bh · on Sept 4, 2019

What do you use for "Auto-generated OpenAPI Typescript functions" ?

Scarblac · on Sept 4, 2019

Django-rest-swagger to have a Swagger / OpenAPI spec on top of Django REST Framework, and then OpenAPITools/openapi-generator to generate Typescript code.

mcamac · on Sept 4, 2019

cool -- what do you use to generate OpenAPI Typescript functions?

squiggleblaz · on Sept 4, 2019

I don't doubt you're right.

If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Do they still obtain some benefit from the schema, since from time to time the semantics of a property will change, and a schema can tell them what the current shape of the data is and guide the migration. Or would they prefer to just write a new property which does a lazy conversion from the old terms? (To an extent, I suppose the answer to this question determines the answer to the first. But I guess there's other tradeoffs to Monggo I'm not aware of, since schemalessness fills me with fear and I just don't want to look there.)

Regarding your final paragraph: I suspect that, for many apps, the choice between relational vs transparent persistency is largely determined by the team who is working on it. The "hard trick" is therefore trying to balance a strong personality with a strong view who disagrees with the rest of the team who have weaker personalities and weaker views, but who all agree on the other side of the fence. This is simply a standard management question with very little technical relevance.

mumblemumble · on Sept 4, 2019

I suppose it depends on whether you want schema-on-read or schema-on-write.

Even if you're working under the first model, there's still a lot an RDBMS can do to help you ensure data integrity. Largely by being less flexible. Databases like MongoDB allow for a more fexible schema, at the cost of pushing a lot of the work of ensuring data integrity into the application code.

For my part, I do a fair bit of working with databases that were built on the MongoDB of the '90s, Lotus Notes, and I've seen what they can grow into over the course of 25 years. It's not pretty. That experience has left me thinking that, while there's certainly a lot of value in the document store model, I wouldn't jump to a document store just because I don't need everything an RDBMS does. I'd only do it if I actively needed a document store.

mamcx · on Sept 4, 2019

> If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Certainly relational like for 90% of the cases, if not all.

The relational model is THE answer to nosql from the start (ie: it was the solution of the originals "nosql").

Is totally more flexible, powerful, dynamic, expressive... And that without talking about ACID!

You can model all "nosql" stores with tables. With limited exceptions it will work very fine for most uses...

> This is simply a standard management question with very little technical relevance.

I don't get what your are implicating here...

But nosql solutions are the ones to be suspected and the ones to requiere a harder qualifications and justifications to use. Is the wrong choice in the hands of the naive. "NoSql" is for experts and for niche/specific workloads.

shkkmo · on Sept 4, 2019

The advantage of using an ORM is that you can always not use it in the places where you are doing things that are not suited to the strength of the ORM.

I tend to hand-write almost all of my migrations and many of my queries that synthesis data from multiple tables to reach a conclusion. I can think of only a handful of times where it was worth writing custom code to persist state (usually only when there are a large number of records that need a couple of specific fields updated.)

Like many tools, it all depends on how well it is used and how well it fits its use-case.

solatic · on Sept 4, 2019

100% agree.

> The trick is figuring out which way fits your business needs.

Rule of thumb - if you don't control the database, use an ORM; if you do control the database, work directly with it.

For example, let's say that your business is a software company that sells an on-prem product. Some of your customers have Postgres expertise, some have MySQL expertise, some MSSQL, some people are stuck on Oracle. Forcing customers to develop DBA expertise in a database they're not familiar with just for the privilege of buying your product is a sales disaster in the making. So you go with an ORM and set up QA testing that tests releases across all of the databases that you support, and the ORM helps you by making it much more likely that your development efforts will automagically succeed in working with each of the supported databases.

In most other situations, though, it makes much more sense to start with the data design. If your business grows, your databases are going to grow. You are almost guaranteed not to switch databases (absent overwhelming financial need, see: Oracle) over the lifetime of your company. Data analysts (data scientists now?) can extract serious value from your databases by getting into the weeds of the database schema, indexes, and queries and working with developers and DBAs to optimize them for business reporting. If you give up control to an automated tool that knows nothing about your business, your business will be less competitive as a result.

Data is far too valuable these days to refuse to develop expertise with the underlying databases.

icebraining · on Sept 4, 2019

Why do your customers even need to have expertise in the DBA you're using? We just sell them a black box (usually VM images), with a few endpoints to extract data in standard formats. They can use whatever they want to connect to those.

hibbelig · on Sept 4, 2019

Maybe one customer wishes to run their databases in a cluster distributed across two continents.

Maybe another customer has bought Oracle and the installation still has room. Also, they have a custom backup scheme that takes their load patterns into account.

icebraining · on Sept 4, 2019

Customers wish many things, that doesn't mean they're relevant selling points. It just sounds bad judgment to me to tie yourself that way, unable to take advantage of the RDBMS to the full. And have you even tested running your (hypothetical) application in a distributed Oracle cluster across two continents? If not, how will you support it?

Absolutely not worth it, in my experience.

stevenwliao · on Sept 4, 2019

What are some heuristics to determine which way fits the current business problem (and future business problems)?

sjwright · on Sept 4, 2019

The most basic question I'd ask: is the core of your application the data or is it the business logic? Or to put it another way: does it make more sense to build the application around data, or to build your data around the application?

Another question I'd ask is whether you're expecting to deal with millions of rows/objects or hundreds of millions. Modelling relational data correctly can have performance impacts in orders of magnitude.

When I look at most projects, I instinctively begin by modelling the data structure; then I think about why/when/how data can move from one state to another; then I think about how an application could prod the data between these states; then I build code which runs against the data.

If your application isn't data at its core (e.g. a document-based app) then it probably makes more sense to treat data elements as objects and use a CRM (or similar) to store and retrieve the objects.

Scarblac · on Sept 4, 2019

An ORM does not usually limit how much you can model your data and create fast queries. The modeling you talk about can be done just as well in e.g. the Django ORM.

sjwright · on Sept 4, 2019

ORMs that I've experimented with tend to fall into one of two categories: either they treat the object model as prime, or they treat the relational model as prime.

The former almost invariably spurt out inefficient queries, or too many queries, or both. They usually require you to let the ORM generate tables. If you just want to have your object oriented design persist in a database, that's great.

The latter almost invariably results in trying to reinvent the SQL syntax in a quasi-language-native, quasi-database-agnostic way. They almost never manage to replicate more than a quarter of the power of real SQL, and in order to do anything non-trivial (or performant at scale) they force you to become an expert SQL and/or how it translates its own syntax into SQL.

And once you become more expert at SQL than your ORM, it's not long before you find the ORM is a net loss to productivity—in particular by how it encourages you to write too much data manipulation logic in code rather than directly in the database.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it has always turned out to be a mistake; de-objectifying it consistently results in simpler, shorter code with fewer data bugs.

I tend to find that the longer data spends being sieved through layers and tossed around inside your application, the more data bugs you'll end up having. It's much better to throw all data at the database as quickly as possible and do all manipulation within the database (where possible) or keep the turnaround into application code as short as possible. It means treating read-only outputs/reports more like isolated mini-applications; the false nirvana of code reuse be damned.

And that doesn't mean replacing an OOP or ORM fetish into a stored procedure/trigger fetish. It means realising that if your application is data at its core, it's your responsibility as a programmer to become an expert at native SQL.

The problem is that far too few programmers realise how deeply complex SQL can be; it's treated like a little side-hustle like regular expressions, when for so many programmers it's the most valuable skill to level up.

dyeje · on Sept 4, 2019

You'll have to pry ActiveRecord from my cold, dead hands. I'll take it any day over the half baked quasi-ORM that your company has inevitably made over the years because an early engineer was anti-ORM and it turns out doing raw SQL all the time is tedious.

danenania · on Sept 4, 2019

What’s nice about ActiveRecord is that while it obviously does hook into the Rails object-oriented style, it’s actually more relational and functional than oo in how it works underneath. It’s a fairly thin layer of abstraction over SQL. I think of it more as a way to build and compose SQL statements than a true ORM—at least when it’s used well.

rjbwork · on Sept 4, 2019

At my last position we simply did sprocs and dapper. When you take as true that the database is the source of truth, it only makes sense to put the queries in the database. The only exception is dynamic queries or modifications.

johnnyb9 · on Sept 4, 2019

How do you avoid leaking business logic into database stored procedures? Sounds like you would now have two sources of truth, application logic, and database stored procedures.

paragraft · on Sept 4, 2019

I'd like to see someone have a decent go at reexamining the idea that business logic (BL) doesn't belong in the DB and tease apart exactly what that should mean, if it's actually still true, or even if it's ever actually been true. Maybe something like that's been posted here before, but I haven't seen it...

I bought into the idea myself for a while, but when I interrogate my belief there it was just something I picked up at uni as part of the general 3-tier approach, and I'm not sure how much of it is really practically grounded. I see the same ideas held dogmatically by newer employees, and when I ask them about it in detail, I see the same fuzziness.

After all, if you want to really be pedantic, then you could claim anything beyond having a single table with two columns, "key" and "value" is pushing BL into the DB.

I.e, what practically does the separation gain you? You may want to swap out the DB in future? Almost never happens, not without some significant other refactoring going on (e.g breaking out into separate DBs because the product's grown to the point of needing distributing across services, etc). If you want to really design with that in mind, you're almost certainly giving up a lot of functionality that the particular DB is going to give you. It's on the level of holding onto the possibility that you'll want to change the language you're writing in at some stage (and that event would favour pushing more into the DB anyway).

In reality, what I've found is when there's a reliable, transactional, relational datastore at the bottom (i.e we're not talking about a larger distributed system), then for your own sanity you want to be pushing as much as possible down the stack as possible, to reduce the surface area that someone could reach through to change data in the wrong way. Data consistency issues are some of the worst when it comes to eroding customer trust, and if your data access/mutation paths are many and varied then you can do your head in or burn through a lot of dev morale trying to diagnose them.

The strongest advocates otherwise I've found are those in office cultures where there's little trust between devs and DBAs and getting things through DB review are a rigamorale that end up driving devs towards doing as much in code as possible. I've always suspected that beyond Google envy, these sort of dynamics are what drove a lot of the initial NoSQL movement...

taffer · on Sept 4, 2019

Some interesting posts/articles on the topic:

https://www.vertabelo.com/blog/business-logic-in-the-databas...

https://sivers.org/pg

https://www.martinfowler.com/articles/dblogic.html

paragraft · on Sept 4, 2019

Thanks, the Derek Sivers piece rings a bell now (and maybe was going into my thinking on this), and the other two will go into my notes to avoid having to repeat myself the next time this one inevitably comes up on the office Slack...

marktangotango · on Sept 4, 2019

I worked on one system that did intentionally put business logic almost entirely in the DB. The VP was a former database guy and the primary justification was auditing. It worked well enough in practice.

There’s a quote from Gavin King somewhere exhorting developers that ORMs and raw sql are meant to be used concurrently, ORM for CRUD ops,sql for most everything else.

The problem I have with ORMs is lack of paging on collections; there’s always that one user that pulls in half the database.

l_t · on Sept 4, 2019

In my experience, keeping BL out of the DB is a practical concern, not a dogmatic one.

Usually there are two things:

1. Performance. Business logic is usually horizontally scalable, databases are usually not. You want to pull as much BL out of the database as possible, and put it into stateless microservices so they can be scaled.

2. Version control and deploying changes to business logic. Business logic changes frequently. Do you want to have to be making that frequent of changes to your database? Do you have practices around safe deployment and code review for stored procedures?

I've worked at a place where there was lots of business logic (millions of LOC) stored "in the database". In this case the database was MUMPS. It can be done, and it can be pleasant. The catch is that vertical scaling is the only option, and you have to spend years building your own tooling from scratch.

I think the line between database and code is destined to become even more blurred, but not by bringing the code down into the database, but by lifting more and more database concepts up into "application space" as distributed computing becomes more normalized.

paragraft · on Sept 4, 2019

Yeah the performance point is interesting, and I think it has merit in a well-defined scope. The idea of bringing DB concepts up is something I've seen done, but it wasn't a pleasant experience overall and I think everyone involved is regretful that we went down that path. It's a good story though. Ramble starts:

The system was an enterprise-focused offering with weak product management, so in the early growth-oriented days of the company ended up saying "yes" to a lot of feature requests that we probably shouldn't have. Where this particularly impacted was the permissions system, which evolved over time from an ACL with odd warts to also include a bolted on hierarchy of users and their objects, and then a role based system, and then all sorts of enterprise-level oddities like cross-links in the hierarchy, shared subhierarchies, roles that could appear at multiple hierarchy points, virtual ACLs that behaved slightly differently etc. So potentially a large number of different access paths between a given user and resource.

Years ago, when this mess was starting to expand it was decided it was too hard to model this in the DB (and really that should've been a giant red flag), so the new approach was to load each customer's dataset up into an application server that would handle the BL, crucially including the permission logic. It very much wasn't stateless, as you mention, but I'm not sure how it could've been really, given you needed so much loaded up from the DB in order to make these permission decisions. Would've avoided a lot of headache if it had...

The consequence though of using this write-through in-memory cache was it became the source of truth.

The chief problem this led to was that shift into application land was a bell that couldn't be unrung. Everything others in this thread have complained about seeing BL spread across all sorts of SPs in a DB happened here, just in application code (which again hints I guess that the chief problem is architectural and lack of governance, not a wrong-layer problem). Nobody could properly describe the permissions system in response to support requests without a lot of code scouring, let alone hold it in their heads.

Even worse, as the application grew in size and needs, we found we still needed things we had left behind in the DB. A couple of smart devs working on the core service, because they were smart and trusted, convinced themselves that what we needed was an in-memory transaction system for managing this in-server cache (by now the core was being called the Object Cache, a name so generic that it also should've been a red flag). So a couple of years went into implementing a technically impressive reimplementation of a transaction system.

Meanwhile the system as a whole was well past the point of being needed to split up into multiple services, so a grand goal was set of moving the Object Cache into a standalone service: the Object Service. Slap an OData API on it, and then leave that API open for all internal teams to hit. By this point the core team who owned this was starting to become well aware they had fallen into a bad pit of reimplementing the Postgres DB everything still sat on: transactions, generic query API, configurable in-memory indexes for each type of object, partition loading logic for the timeseries sets, user permissions etc. Worse, the generic query API (and this is what's ultimately turned me off OData/GraphQL etc for cross-team interfaces) ran into all the same problems as an SQL interface - people in other teams would always be coming up with brand new queries your indexes hadn't anticipated, forcing new work on the keepers of the cache to support.

The way forward probably would've been to leave the permissions structure in place and pull as much as possible out of the service/cache into separate stanadlone services, i.e leave the objects in the object cache little more than just IDs you could look up elsewhere for actual attributes. We'll never really know though: it was recently decided to put the whole thing into maintenance mode, partially because nobody has the political will to fix the thing and its complexity.

I've thought a lot about the lessons of this system and where it went wrong, and I trace it ultimately to forging ahead with a permissions system (i.e core BL) that couldn't be modelled in the DB. I think that doomed us to what I'd ultimately name second system syndrome (i.e the whole stack above slowly evolved into a poor Postgres clone). Perhaps if we had been ruthless in going to a CQRS design or similar very early on we could've pulled it off (which would again demand stateless services above) we could've pulled it off.

I think a big takeway for me was to not take for granted the good things a RDBMS is giving you, especially transactionality. I think I'd bias these days towards looking for ways to scale up the powers the DB gives you (even if it involves pushing external integrations into the DB maybe via some funny FDW or transaction log integrations, or maybe using a tiered solution of DBs, one layer as the storage layer and another layer above that acting as coherent transctors, a little bit similar to Datomic's internal structure), rather than rushing toward replicating core DB functionality in application code, which can be tempting initially as the easier-looking solutions.

I think I might blog about this when the system's properly retired and the involved can look back and laugh rather than cringe...

tremon · on Sept 4, 2019

As a rule, relational database systems have two strong areas: parallel processing and data consistency enforcement. I've always understood "don't do business logic in the database" as meaning not to use the database engine for sequential processing.

If you have a single-user database system (e.g. webapp with database backend), there is relatively little to gain implementing data validity checking in the database, other than the declarative statements versus imperative rules discussion (which can already be a huge benefit, depending on the team).

But once you have a central database with multiple frontends (common in enterprise ERP solutions), enforcing data consistency in the backend becomes pretty much unavoidable -- otherwise a single bug or new feature in one frontend could disable entire production lines.

when there's a reliable, transactional, relational datastore at the bottom [..] you want to be pushing as much as possible down the stack, to reduce the surface area that someone could [..] change data in the wrong way

Yes, this. Very much this.

Banderly · on Sept 4, 2019

I'm working on a project that is almost literally your example of reducing a relational database to a key value store, despite client applications having complex problem domains to model.

The consequence has been catastrophic data consistency problems and the whole programme rapidly grinding to a halt.

I think the problem is the overly broad definition of 'business logic' that encompasses everything from data integrity to presentation logic.

jessaustin · on Sept 4, 2019

I've seen "business logic" refer to both A) how a discount promotion works and B) how to arrange images of the product for a particular set of customers. Obviously it's fine to have A in the database...

mgkimsal · on Sept 4, 2019

thanks for bringing that up. everyone on a team needs to have the same working definitions, and yeah... "business logic" is a term that's used a lot, but I very often see people not really understanding that it's usually meant in opposition to "display logic", not just as an interchangeable term for "any code".

ben509 · on Sept 4, 2019

Business logic being in stored procedures is the ideal place for it.

The problem with it is technical: the tooling for migrating and managing schemas and stored procedures is hot garbage, and there aren't good ways to enforce consistency between the invariants of the applications and the database.

Really, it should be possible, on both application deploy and when stored procedures are updated, to import library functions that the stored procedures use from the database and run unit tests against them to ensure that all clients are compatible.

Rapzid · on Sept 4, 2019

Have done stuff like this where the queries were quite complex and essentially BI style.. Wrote a system to store and manage the procs in VCS and manage them through the ORM migration system(ORMs are quite a la cart).

Your either generating and submitting the queries through prepared statements or just storing them in the database. It's not "leaking" IMHO to do the later and in fact can come with benefits; you can call the procs while connected to the database through various other clients be it command line or a BI system. So in effect you are encapsulating the logic in the place with the most potential for re-use.

UK-Al05 · on Sept 4, 2019

It's great when you have good database ci systems and have your database schema versioned + sprocs in source control.

But I've worked places where dba refused to use version control.

rjbwork · on Sept 4, 2019

With the MS stack it's really easy. You can design the schema and procs, views, etc. in Visual Studio, then have it compare against each environment to create the deployment script.

busterarm · on Sept 4, 2019

I like ActiveRecord. I like it so much I've implemented the Active Record pattern myself and compared my results with ActiveRecord.

That said, I've come around to the Repository pattern and quite like the Ecto way of doing things. It's a lot more flexible at the expense of some simplicity. It makes some really hard things easier than AR can.

captainbland · on Sept 4, 2019

This but Hibernate + Spring Repositories. Writing SQL is fine but the 95% CRUD cases are much better served with a decently managed ORM. We have an older project here using iBATIS, MSSQL and lots of stored procedures/SQL-first approach. See this tutorial for a preview of the horror to come: https://www.tutorialspoint.com/ibatis/ibatis_create_operatio...

'Employee.xml'

Now imagine this but with stored procedures and inserts with tens of parameters.

Fins · on Sept 4, 2019

I've worked on a system where almost all of the business logic was handled as stored procedures (apart from things that belonged at the UI level, of course). Worked fine, was easy to debug , and performed great. Sure, you had to know how to actually write good SQL, which threw off kids who think programming equals using JavaScript framework du jour, but I don't see it as a shortcoming. No XML anywhere either.

At some point they rewrote it as a web app with NHibernate and all that. Took them literally many hundreds of man-years, and it still runs like molasses. And profiling whatever crap NHibernate emits is rather joyless enterprise.

pjm331 · on Sept 4, 2019

As a fellow AR lover the only things that consistently bug me are when it does things like:

- .count on a collection calls SQL every time, but there is also a .count method for arrays - which means you need to remember to use .length if you just want to count the objects loaded into memory

- ‘build’ methods for an association usually only initialize a record in memory, but with has_one associations it deletes any record that already exists

So basically when it’s not clear about the fact that it’s doing database things.

Gasparila · on Sept 4, 2019

There's a .size that works for pretty much everything: if the relation isn't loaded then it will do select count(*), but for arrays and loaded associations it does length of an array

erikpukinskis · on Sept 4, 2019

A fair comparison would be ActiveRecord vs raw Arel.

I.e. ORM vs query builder.

mattigames · on Sept 4, 2019

So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?

Then you call your alternative anti-orm and it generates... orm specs, and then you say "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?

yellowapple · on Sept 4, 2019

Any SQL wrapper worth its salt provides some mechanism to specify positional (or even named) parameters within that string (without resorting to string interpolation or concatenation) and pass them in when executing the query. Usually it'll be something like

    foos = execute_sql("SELECT * FROM foo WHERE bar = ? AND baz = ?", [bar, baz])

If you're resorting to concatenating strings, then you're almost certainly opening yourself up to SQL injections (and rather needlessly, I might add!).

deathanatos · on Sept 4, 2019

Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Also, getting the library to properly expose the formatting interface is also a trick. We would do massive loads/dumps that we did not want to use Python for (but we worked in Python) as the DB-API pulls the entire result set into RAM, and we were pulling >RAM (and we didn't want to do something lower-level w/ the driver). We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. And the Python library really doesn't (last I checked) have a public interface to the templater portion of the library. (I think this is b/c PG just sends the parameters, and the binding is done server-side. So, really, psql needs an interface to supply those values as arguments to the CLI.)

yellowapple · on Sept 4, 2019

> We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. [...] So, really, psql needs an interface to supply those values as arguments to the CLI.

I haven't tried this before, so caveat emptor, but per psql's manpage this should be supported:

    QUERY="SELECT * FROM sometable WHERE name = :'foo'"
    echo $QUERY | psql -v foo=asdf # ... other args ...

Basically: you can use colon-prefixed variables in the query text (with the variable name optionally quoted to indicate how the result should be quoted, apparently?), and set them with the -v option (which is equivalent to using the \set command within the query string itself).

This also happens to work for table names, at least per the example in the manpage:

    testdb=> \set foo 'my_table'
    testdb=> SELECT * FROM :"foo";

Except in this case it'd be more

    echo 'SELECT * FROM :"foo";' | psql -v foo=my_table # ... other args ...

Still no option there for positional parameters, but it's a start, right?

SmirkingRevenge · on Sept 4, 2019

> Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Thin SQL wrappers/dsl's like the SQLAlchemy expression library are great. IMO those thin wrappers are what most people should reach for first, over a full blown ORM. A good mimimal sql wrapper will:

* Save people from pain, problems, and security issues involved in building queries through string manipulation.

* Map very closely to raw SQL

* Make it easy to compose query expressions and queries, using language native features.

* Help devs develop their SQL skills. Learning the wrapper IS learning SQL, for the most part.

I have a hard time seeing many good reasons to add any more layers of abstraction on top. That last point is particularly important to me. My first exposure to many advanced SQL techniques, was through such libraries. Since the code maps to sql quite naturally, that learning can be applied in a much wider variety of contexts. Teach someone how to do aggregations using Django's ORM, and you've taught them to do aggregations in Django's ORM, and basically nowhere else.

kelnos · on Sept 4, 2019

I think the issue the parent is referring to is when you have optional clauses in your statement. Like if you have a table with 5 update-able columns (where someone might want to update any 1 to 5 columns that you don't know in advance), and you want some generic code that updates a row, if you're writing raw SQL then you have to do concatenations to build up the "set clause":

    UPDATE sometable SET {set clause} WHERE id = ?

The "set clause" might just be "foo = ?", but it might be "foo = ?, bar = ?", or "foo = ?, baz = ?", etc.

yellowapple · on Sept 4, 2019

SQL's got you covered here:

    UPDATE sometable SET
        foo = COALESCE(?, foo),
        bar = COALESCE(?, bar),
        baz = COALESCE(?, baz),
        bam = COALESCE(?, bam),
        bat = COALESCE(?, bat)
    WHERE id = ?

And then you'd pass in NULLs for any columns not being updated at the moment.

Confirmed that this works exactly as expected in SQLite:

    $ sqlite3 optional_column_test.db
    sqlite> CREATE TABLE sometable (id INTEGER PRIMARY KEY, foo, bar, baz, bam, bat);
    sqlite> INSERT INTO sometable (foo,bar,baz,bam,bat) VALUES ('You', 'are', 'a', 'bold', 'one');
    sqlite> SELECT * FROM sometable;
    1|You|are|a|bold|one
    sqlite> UPDATE sometable SET
       ...> foo = COALESCE(NULL, foo),
       ...> bar = COALESCE('ain''t', bar),
       ...> baz = COALESCE(NULL, baz),
       ...> bam = COALESCE(NULL, bam),
       ...> bat = COALESCE(NULL, bat)
       ...> WHERE id = 1;
    sqlite> SELECT * FROM sometable;
    1|You|ain't|a|bold|one

kelnos · on Sept 6, 2019

I'd never heard of COALESCE before! I'm very very far from a SQL expert, but I'm surprised that's never popped up for me before.

Are there any bad performance implications here? Like, will it still rewrite columns that don't need to be updated?

yellowapple · on Sept 6, 2019

Good question.

The answer to that is almost certainly "it depends on your DB". Most query planners should be smart enough to not try to write anything, but some are smarter than others. For example, SQL Server (if I'm understanding right) does perform a log write even if it should be obvious that the value won't change: https://www.sql.kiwi/2010/08/the-impact-of-non-updating-upda...

That being said, I'd skeptical of the performance impact (if any) being all that significant; even in a worst-case scenario of "yeah, it's going to stupidly write every field every time", the columns for each row should be pretty close together both in-memory and on-disk, so unless you're writing a whole bunch of giant strings or something at once (and even then) there shouldn't be a whole lot of seeking happening.

tl;dr: assuming there's a minor performance impact is reasonable, but it should probably be measured before trying to optimize it away.

squiggleblaz · on Sept 4, 2019

So you mean I'm writing code like this, in a language somewhat like Typescript but simplified to my end.

   function update(id: ID_TYPE, value: Partial<sometable>) {
      const set_clause = value.keys.map(k => k + " = ?").join(", ")
      const params = value.values.concat([id])

      db.query("UPDATE sometable SET " + set_clause + " WHERE = ?", params)
   }

How is code not terrible? It shouldn't pass code review. Once you concatenate strings to generate your sql, all bets are off. There will be a day when your object doesn't match your assumptions. Either you have a series of different functions, each of which will change under different circumstances; or you should just be generating this function like this directly from the schema in which case you might as well just generate code you want at build time so there's string concatenations anywhere.

Is there some other circumstance when you want that?

kelnos · on Sept 6, 2019

It is terrible. That's the problem. No one wants to write code like that, so they turn to an ORM to do it for them, in a hopefully safer way.

zug_zug · on Sept 4, 2019

So I want to respond to this because there is a genuine question behind your rhetoric. And that question is "How do I cleanly do complex things in SQL without an ORM?"

And there is an answer to that. The idealistic answer is good DB design from the start. The realistic answer is by using views (sometimes materialized views) to create a clean representation out of odd tables.

The monster query can be decomposed into simple and clean operations against these views (and if necessary, views upon views).

The only caveat is you need a SQL expert to understand the performance if you want to do this at scale.

zzzeek · on Sept 4, 2019

I wrote SQLAlchemy after a five year job where all we did was work with enormous views in an Oracle database. They performed nightmarishly bad. The slightest attempt to query from one of these views using something as heretical as a join would kill the database. The source code for the views was locked up tight by DBAs who had no idea how our application worked, didn't care, and they could never be bothered to help us with their horrendously inefficient views or to give us new ones that our application sorely needed in order to get data from the database in an efficient manner.

SQLAlchemy exists exactly because of the pain the "just use views" approach causes. The entire point is that you can structure a query of arbitrary complexity in a succinct manner using Python structures, while not losing any of the relational capabilities. Of course you can write queries that are too complex, if you're then lucky enough to be able to have materialized views at your disposal, you can turn the SQL you wrote into one. But that's an optimization you can use if you need it, or not.

zug_zug · on Sept 4, 2019

On the one hand the article itself is offering a anecdote about being unhappy with ORMs. On the other hand you have an experience in which you were unhappy with views.

That would leave us at an impasse (he said / she said), unless we get more nuanced. So let's.

It sounds like some of your frustration is at some DBAs you knew. Firstly I've never worked with a DBA, I'm just a senior software engineer / DevOps and to me that includes an intimate knowledge of SQL perfomance, so maybe this is why I don't mind an in-database approach.

I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

Now maybe you agree with that, and see ORMs as a tool for the kind of job where the dev is locked out of the database. If so then I can't really disagree as I haven't worked at such places much at all.

zzzeek · on Sept 4, 2019

> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

if you have the option to use materialized views, that can be helpful when appropriate, however materialized views have their own issues, not the least of which is that they need to be refreshed in order to have current data, they can take up a vast amount of storage, not to mention a lot of databases don't support them at all. It should not be necessary to emit CREATE MATERIALIZED VIEW with super-user privileges and to build the infrastructure to store them and keep them up-to-date for every query one needs to write or modify that happens to have some degree of composed complexity.

you don't need an ORM for these queries either, just a composable query builder (like SQLAlchemy Core) that allows the composition task to be more easily organizable in application code.

none of this means you can't use views but it's unnecessary to dictate that they should be the only tool available for dealing with large composed queries.

balfirevic · on Sept 4, 2019

> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

SQL Alchemy will spit out SQL query. Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.

zug_zug · on Sept 4, 2019

> Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.

No, on two levels. Firstly, there are many features in SQL that an ORM will not support (e.g. no substitute for a materialized view).

Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column). Part of being expert in DB technologies is knowing how/why some ways of writing a query are very fast, and others are very slow. The idea of an ORM is to hide that complexity, which is fine for a toy todo-mvc app. But once you start querying tables with 100k rows you need to understand that complexity and master it if you want to write a fast query.

zzzeek · on Sept 4, 2019

> Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column).

SQLAlchemy supports case-insensitive searches, per-column / per-expression collation settings, index and other SQL hint formats for all databases that support them, e.g. SQLAlchemy's query language supports most optimizing SQL syntaxes, with the possible exception of very esoteric / outdated Oracle things like which table you list first in the FROM clause (use index hints instead). We are adding esoteric performance features all the time to support not just SQL-level tricks but driver level tricks too which are usually much more consequential, such as the typing information applied to bound parameters matching up for indexes as well as special driver-level APIs to improve the speed of bulk operations.

SQLAlchemy has been around for thirteen years, plenty of SQL experts have come along and requested these features and we implement them all. Feel free to come up with examples of SQL-expert level performance optimizations that SQLAlchemy doesn't support and we'll look into them.

zug_zug · on Sept 5, 2019

Yeah, don't get me wrong, I haven't used SQL alchemy and I'm not trying to besmirch it. I'm not even sure we disagree.

What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work. My point about views is that in my experience, the answer to disgusting queries is cleaning up the DB design (and views can be the tool to accomplish this). My point about case-insensitive searching (you can create a case-insensitive index if you want) is that a lot of db-performance stuff just can't be solved on the query side alone anyways.

It sounds like SqlAlchemy is designed with a lot of flexibility around how queries are run, so maybe you agree that understanding what's going on beneath the hood is important to handle these complicated cases.

zzzeek · on Sept 5, 2019

> What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work.

Thank you for this response and I agree, we are likely on the same page. I don't know that anyone actually espouses that philosophy. This is the anxiety that ORM skeptics have, and certainly you get beginners who rush into using ORMs not knowing what they are doing, but if someone wants to be a DB expert, I'm pretty sure they go to read about databases :) These ORM beginners will fail either with the ORM or without out. I guess the idea is you'd prefer they "fail fast", e.g. the ORM covers for them while they proceed to screw up more deeply? This is arguable; if you've seen much of the non-ORM raw DB code I've seen, it too fails pretty hard. But even with this argument, if ORMs produce the problem of incompetents who are not found out fast enough, why hoist the denial of useful tools to those developers who do know what they're doing.

balfirevic · on Sept 4, 2019

You have shifted the discussion from ability of DB engine to optimize queries to the ability of a developer to write fast queries.

But in any case, we are not discussing some abstract ORM whose "idea is to hide complexity", but SQL Alchemy whose idea (one of them, at least) is to allow you to write composable queries yet still retain full power of SQL.

aidos · on Sept 4, 2019

You are correct about needing to understand the underlying dB to be able to optimise it. I consider myself lucky to have come from a raw sql background and to have a deep understanding about how the query planners work.

You are incorrect in your other assertion. Sqlalchemy allows for optimising your queries at runtime in a way that just isn’t available to sql views. The beauty of sqlalchemy is that allows for arbitrary composition of blocks of sql. I too have had to fight with dbas in a previous life because they didn’t want me to construct a query in code (non sqla), but I eventually won because I was magnitudes faster because I had runtime knowledge they couldn’t use.

fulafel · on Sept 4, 2019

Many of us also know stories about how ORMs can be slow if developers don't pay sufficient attention. In some cases told by database experts who see the query code locked up tight by the app team, etc :)

Having a separate "DBA" from the developers or having an adverserial relationship with said person is not really a technical problem I think.

bb88 · on Sept 4, 2019

I've had similar issues, but I got the added benefit of being "DBA-splained" on why what we were doing was terrible.

I feel like if a DBA keeps thinking of the database as the solution to all of the problems (by implementing views, triggers, stored procedures, etc) then maybe that person should be willing to support them, otherwise they really shouldn't complain that the developers moved to an ORM.

squiggleblaz · on Sept 4, 2019

Okay, obviously I've never worked in an enterprise enough company. I've always assumed a DBA was just a member of a backend team who likes thinking about SQLy stuff. Or a sysadmin who upgrades the computer the db runs on. Basically, a hat someone wears.

Do I seriously make the correct assumption that the people who are writing the backend of many apps are actually separated from the people who decide how the data those apps process the data, and you may have no direct say about the database schema? that effectively just a client - perhaps the only client, but nothing more than a client.

Obviously you've described the disadvantage of this approach; what advantage does it have?

buckminster · on Sept 4, 2019

It makes some sense when the data is extremely valuable, like in a bank. As a developer you can only access the db through stored procedures created by the database team. Which limits the damage you can do. I haven't worked in a big enterprise in years though so I don't know how common this still is.

jayd16 · on Sept 4, 2019

Wait, so you wrote this library to solve the political issue of crap DBAs?

bayesian_horse · on Sept 4, 2019

Code inside a database is a nightmare to maintain.

Just like Data in a codebase ;-)

bb88 · on Sept 4, 2019

> The idealistic answer is good DB design from the start.

The problem with this statement is that DB schemas like code can evolve badly over time, even if they're maintained by a DBA sometimes (since they're human afterall too).

And then the DBA starts to suggest that you use things like triggers in your DB, which may or may not be good, but often surprise developers. "I just wrote 0 into the DB, why did it turn into NULL? What line of code did that? Ohhhh... there's a trigger."

netfl0 · on Sept 4, 2019

There were multiple literal questions embedded in his rhetoric; which without answers, invalidate the author's solution.

Can_Not · on Sept 4, 2019

> The idealistic answer is good DB design from the start.

I don't feel like this answers the question. My ORM supports views.

Aeolun · on Sept 4, 2019

> What does git history has to do with the database?

Hopefully a lot. Non-versioned database schemas are the devil.

netfl0 · on Sept 4, 2019

Indeed this is a bad idea for many reasons.

I tried to think about it from a gitflow release cycle perspective to be generous. This would mean constant churn of migration code for each branch of the repo.

If you have complex migrations which go beyond simply renaming columns, it would require you to write special migration code on a release branch which would then need to be merged back. yuck!

shkkmo · on Sept 4, 2019

In what way is manually tracking the needed DB changes per feature branch even possibly better? Even if you are not running your migrations automatically on deploy, I would still check those migrations into version control.

Having auto-runnable migrations checked in alongside the code that depends on them allows you to review and test those migrations as part of your normal code review and CI processes.

I think there must be some misunderstanding here because I don't understand why you say either of these things:

> This would mean constant churn of migration code for each branch of the repo.

> it would require you to write special migration code on a release branch

There should only a be at most a few migrations per feature branch and merging is only an issue when you have two feature branches with inconsistent schema definitions. You should only have to add custom migrations to a release branch to fix bugs and those should be merged back into your develop branch as soon as possible to minimize effort spend merging later.

reificator · on Sept 4, 2019

I think you and the post you're replying to have opposite opinions, despite your post seemingly starting off by agreeing with them.

netfl0 · on Sept 4, 2019

No, I agree with the post I replied to in that migrations should be human versioned as opposed to auto-versioned via a VCS. Again, I was being generous to the author trying to imagine a scenario where this would not create a dumpster fire.

Hopefully I have cleared that up.

evanelias · on Sept 4, 2019

This isn't quite the same as being solely git-based, but there are numerous examples in the industry of successful use of auto-versioned declarative schema management. By this I mean, you have a repo of CREATE statements, and a schema change request = making a new commit that adds, modifies, or removes those CREATE statements. No one actually writes a "migration" in this approach.

Facebook, who operate one of the largest relational database fleets in the world, have used declarative schema management company-wide for nearly a decade. If engineered well, you end up with a system that allows developers to change schemas easily without needing any DBA intervention.

netfl0 · on Sept 4, 2019

This is interesting but I am curious about some of the implications. I am under the impression that migrations encompass more than schema changes. It may require data transformations, index rebuilds, or updates to related tables. I would agree that these things are preferably avoided, but in practice it seems that they happen frequently.

I viewed the author as targeting smaller dev shops who don't have a dedicated DBA/expert. Groups who might use an ORM's migration framework for example. In which case, it would appear that there is significant loss in migration flexibility if you remove application code from the architecture.

evanelias · on Sept 4, 2019

When operating at scale, it becomes necessary to have separate processes/systems for schema changes (DDL) vs row data migrations (DML) anyway.

For example, say you need to populate a new column based on data in other columns, across an entire table. If the table is a billion rows, you can't safely run this as a single DML query. It will take too long to run and be disruptive to applications, given the locking impact; it will cause excessive journaling due to the huge transaction; there can be MVCC old-row-version-history pileups as well.

Typically companies build custom systems that handle this by splitting up DML into many smaller queries. It's then no longer atomic, which means such a system needs knowledge about application-level consistency assumptions.

singron · on Sept 4, 2019

> "migration commands based on the git history"... what does that even mean?

It actually seems like a really cool idea. E.g.

* Commit A: create a schema.sql file "create table t (a INT);"

* Commit B: update the schema.sql to "create table t (a INT, b INT);"

* The tool can generate a migration from A..B "alter table t add column b INT;"

It seems like it probably would work really well for creating new tables and columns in simple projects. In more complex projects I've worked on, we usually had more complicated migrations to update the data to fit the new schema, and we often had to be careful to interleave schema, data and code updates (especially if you wanted to be able to rollback).

DaiPlusPlus · on Sept 4, 2019

Automatic migrations using only source-code snapshots don’t work for things like column renames or splitting/combining columns. Every tool I’ve used requires some migration metadata.

evanelias · on Sept 4, 2019

> "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?

I'm not sure what git-specific interaction the author is envisioning, but more generally: declarative schema management tools are based on the notion of using a repo to store only CREATE statements. It's an infrastructure-as-code style approach, where to modify an existing table, you just modify the CREATE definition. The tool figures out how to translate this to an ALTER.

I'm the author of an existing tool in this space, Skeema [1], which supports MySQL and MariaDB. Other recent tools in this space include migra [2] and sqldef [3]. From what I understand, in the SQL Server world there are a number of others.

[1] https://www.skeema.io

[2] https://github.com/djrobstep/migra

[3] https://github.com/k0kubun/sqldef/

LaGrange · on Sept 4, 2019

> So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?

Those aren't the only alternatives. Query builders exist.

singron · on Sept 4, 2019

A lot of languages have pleasant multi-line string literals. Also if you have a 1000 line hand-crafted SQL query, it's probably not going to be pleasant in your ORM either.

peteforde · on Sept 4, 2019

I know that the author of SQLAlchemy is reading these comments and I don't want to be mean, but anecdotally I hear a lot of pain stories and the OP seems to largely deserve the title "Don't Use SQLAlchemy".

Nobody suggests that ActiveRecord is perfect, but it is pretty amazing at what it does. I think a big part of the reason it's more reliably excellent is that Rails in general embraces the notion of "convention over configuration" - there's typically a correct naming scheme (with opinionated pluralization) that starts in the table scheme and flows straight through the models to the controllers and finally the view layer. I can join your Rails project and make guesses about where your logical components are located and what they are called that would beat any TV psychic cold 100:1.

Plus, much like with SQLAlchemy and basically every other modern ORM... you are always one method away from writing raw SQL if you want to be.

As always: use the tools that work best for you and for the vast majority of web development cases, you'd have to pry ActiveRecord out of my hands.

fizx · on Sept 4, 2019

I use SQLAlchemy to serve tens of thousands of requests per second on one of my services at Reddit. It's painful, but possible.

SQLAlchemy uses the data mapper + unit of work patterns. This means that your connections and your record objects have the same lifecycle. If your request takes 200ms, then you are likely holding open your connections for 190ms, even though the only SQL query takes 20ms. If connections are scarce, and they usually are, you will not utilize them well at all.

There are two ways that I know of to solve this, neither of which is pretty: (1) run pgbouncer as a sidecar on your python processes. This has the effect of making connections a far cheaper resource. (2) After every single query, manually close the related transaction/connection/cursor in your code.

ActiveRecord defaults to giving up the connection after every query, which in a high-traffic webapp, is much less of a footgun.

traviscj · on Sept 4, 2019

(No significant experience with either SQLAlchemy or ActiveRecord, but work in Java at a shop that really strongly emphasizes the unit-of-work pattern.)

"give up connection after every query" is, IMO, not _less_ of a footgun, but a _more subtle_ footgun. That is to say, sure, for the most part, it will work ok.

_But when it doesn't_... like when your database goes down for a couple hours, or your rack loses network, or your proxy to whatever cloud provider is running your database, or maybe some other dependency had an outage between the queries... you end up not really knowing what executed and what didn't, with some fraction of your 10k req/sec leaving behind some dangling state -- a partially processed $whatsit -- and no clear/easy way to fix it or even necessarily identify how many $whatsits were affected. At least a few times, I've needed to either wait until records make their way into some OLAP system for offline analysis, or deploy some fixup code, or emulate either/both to fix stuff up after something weird happens.

At least that's what I am constantly afraid of. Do you also worry about this kind of stuff? If not, what prevents it from being a problem in your environment?

Signed,

The guy constantly badgering seemingly everyone about minding their transaction boundaries.

mistahenry · on Sept 4, 2019

From my experience, there’s always one developer in every enterprise dev shop who understands transaction boundaries, isolation levels, and concurrency/reliability scenarios that do happen....and 10 other devs constantly instilling fear in him/her

jayd16 · on Sept 4, 2019

But he's not saying "don't use transactions." He's saying "Don't keep your transaction open for your entire user connection for no reason. Close your completed transactions as soon as you can."