You'll have to pry ActiveRecord from my cold, dead hands. I'll take it any day over the half baked quasi-ORM that your company has inevitably made over the years because an early engineer was anti-ORM and it turns out doing raw SQL all the time is tedious.
What’s nice about ActiveRecord is that while it obviously does hook into the Rails object-oriented style, it’s actually more relational and functional than oo in how it works underneath. It’s a fairly thin layer of abstraction over SQL. I think of it more as a way to build and compose SQL statements than a true ORM—at least when it’s used well.
At my last position we simply did sprocs and dapper. When you take as true that the database is the source of truth, it only makes sense to put the queries in the database. The only exception is dynamic queries or modifications.
How do you avoid leaking business logic into database stored procedures? Sounds like you would now have two sources of truth, application logic, and database stored procedures.
I'd like to see someone have a decent go at reexamining the idea that business logic (BL) doesn't belong in the DB and tease apart exactly what that should mean, if it's actually still true, or even if it's ever actually been true. Maybe something like that's been posted here before, but I haven't seen it...
I bought into the idea myself for a while, but when I interrogate my belief there it was just something I picked up at uni as part of the general 3-tier approach, and I'm not sure how much of it is really practically grounded. I see the same ideas held dogmatically by newer employees, and when I ask them about it in detail, I see the same fuzziness.
After all, if you want to really be pedantic, then you could claim anything beyond having a single table with two columns, "key" and "value" is pushing BL into the DB.
I.e, what practically does the separation gain you? You may want to swap out the DB in future? Almost never happens, not without some significant other refactoring going on (e.g breaking out into separate DBs because the product's grown to the point of needing distributing across services, etc). If you want to really design with that in mind, you're almost certainly giving up a lot of functionality that the particular DB is going to give you. It's on the level of holding onto the possibility that you'll want to change the language you're writing in at some stage (and that event would favour pushing more into the DB anyway).
In reality, what I've found is when there's a reliable, transactional, relational datastore at the bottom (i.e we're not talking about a larger distributed system), then for your own sanity you want to be pushing as much as possible down the stack as possible, to reduce the surface area that someone could reach through to change data in the wrong way. Data consistency issues are some of the worst when it comes to eroding customer trust, and if your data access/mutation paths are many and varied then you can do your head in or burn through a lot of dev morale trying to diagnose them.
The strongest advocates otherwise I've found are those in office cultures where there's little trust between devs and DBAs and getting things through DB review are a rigamorale that end up driving devs towards doing as much in code as possible. I've always suspected that beyond Google envy, these sort of dynamics are what drove a lot of the initial NoSQL movement...
Thanks, the Derek Sivers piece rings a bell now (and maybe was going into my thinking on this), and the other two will go into my notes to avoid having to repeat myself the next time this one inevitably comes up on the office Slack...
I worked on one system that did intentionally put business logic almost entirely in the DB. The VP was a former database guy and the primary justification was auditing. It worked well enough in practice.
There’s a quote from Gavin King somewhere exhorting developers that ORMs and raw sql are meant to be used concurrently, ORM for CRUD ops,sql for most everything else.
The problem I have with ORMs is lack of paging on collections; there’s always that one user that pulls in half the database.
In my experience, keeping BL out of the DB is a practical concern, not a dogmatic one.
Usually there are two things:
1. Performance. Business logic is usually horizontally scalable, databases are usually not. You want to pull as much BL out of the database as possible, and put it into stateless microservices so they can be scaled.
2. Version control and deploying changes to business logic. Business logic changes frequently. Do you want to have to be making that frequent of changes to your database? Do you have practices around safe deployment and code review for stored procedures?
I've worked at a place where there was lots of business logic (millions of LOC) stored "in the database". In this case the database was MUMPS. It can be done, and it can be pleasant. The catch is that vertical scaling is the only option, and you have to spend years building your own tooling from scratch.
I think the line between database and code is destined to become even more blurred, but not by bringing the code down into the database, but by lifting more and more database concepts up into "application space" as distributed computing becomes more normalized.
Yeah the performance point is interesting, and I think it has merit in a well-defined scope. The idea of bringing DB concepts up is something I've seen done, but it wasn't a pleasant experience overall and I think everyone involved is regretful that we went down that path. It's a good story though. Ramble starts:
The system was an enterprise-focused offering with weak product management, so in the early growth-oriented days of the company ended up saying "yes" to a lot of feature requests that we probably shouldn't have. Where this particularly impacted was the permissions system, which evolved over time from an ACL with odd warts to also include a bolted on hierarchy of users and their objects, and then a role based system, and then all sorts of enterprise-level oddities like cross-links in the hierarchy, shared subhierarchies, roles that could appear at multiple hierarchy points, virtual ACLs that behaved slightly differently etc. So potentially a large number of different access paths between a given user and resource.
Years ago, when this mess was starting to expand it was decided it was too hard to model this in the DB (and really that should've been a giant red flag), so the new approach was to load each customer's dataset up into an application server that would handle the BL, crucially including the permission logic. It very much wasn't stateless, as you mention, but I'm not sure how it could've been really, given you needed so much loaded up from the DB in order to make these permission decisions. Would've avoided a lot of headache if it had...
The consequence though of using this write-through in-memory cache was it became the source of truth.
The chief problem this led to was that shift into application land was a bell that couldn't be unrung. Everything others in this thread have complained about seeing BL spread across all sorts of SPs in a DB happened here, just in application code (which again hints I guess that the chief problem is architectural and lack of governance, not a wrong-layer problem). Nobody could properly describe the permissions system in response to support requests without a lot of code scouring, let alone hold it in their heads.
Even worse, as the application grew in size and needs, we found we still needed things we had left behind in the DB. A couple of smart devs working on the core service, because they were smart and trusted, convinced themselves that what we needed was an in-memory transaction system for managing this in-server cache (by now the core was being called the Object Cache, a name so generic that it also should've been a red flag). So a couple of years went into implementing a technically impressive reimplementation of a transaction system.
Meanwhile the system as a whole was well past the point of being needed to split up into multiple services, so a grand goal was set of moving the Object Cache into a standalone service: the Object Service. Slap an OData API on it, and then leave that API open for all internal teams to hit. By this point the core team who owned this was starting to become well aware they had fallen into a bad pit of reimplementing the Postgres DB everything still sat on: transactions, generic query API, configurable in-memory indexes for each type of object, partition loading logic for the timeseries sets, user permissions etc. Worse, the generic query API (and this is what's ultimately turned me off OData/GraphQL etc for cross-team interfaces) ran into all the same problems as an SQL interface - people in other teams would always be coming up with brand new queries your indexes hadn't anticipated, forcing new work on the keepers of the cache to support.
The way forward probably would've been to leave the permissions structure in place and pull as much as possible out of the service/cache into separate stanadlone services, i.e leave the objects in the object cache little more than just IDs you could look up elsewhere for actual attributes. We'll never really know though: it was recently decided to put the whole thing into maintenance mode, partially because nobody has the political will to fix the thing and its complexity.
I've thought a lot about the lessons of this system and where it went wrong, and I trace it ultimately to forging ahead with a permissions system (i.e core BL) that couldn't be modelled in the DB. I think that doomed us to what I'd ultimately name second system syndrome (i.e the whole stack above slowly evolved into a poor Postgres clone). Perhaps if we had been ruthless in going to a CQRS design or similar very early on we could've pulled it off (which would again demand stateless services above) we could've pulled it off.
I think a big takeway for me was to not take for granted the good things a RDBMS is giving you, especially transactionality. I think I'd bias these days towards looking for ways to scale up the powers the DB gives you (even if it involves pushing external integrations into the DB maybe via some funny FDW or transaction log integrations, or maybe using a tiered solution of DBs, one layer as the storage layer and another layer above that acting as coherent transctors, a little bit similar to Datomic's internal structure), rather than rushing toward replicating core DB functionality in application code, which can be tempting initially as the easier-looking solutions.
I think I might blog about this when the system's properly retired and the involved can look back and laugh rather than cringe...
As a rule, relational database systems have two strong areas: parallel processing and data consistency enforcement. I've always understood "don't do business logic in the database" as meaning not to use the database engine for sequential processing.
If you have a single-user database system (e.g. webapp with database backend), there is relatively little to gain implementing data validity checking in the database, other than the declarative statements versus imperative rules discussion (which can already be a huge benefit, depending on the team).
But once you have a central database with multiple frontends (common in enterprise ERP solutions), enforcing data consistency in the backend becomes pretty much unavoidable -- otherwise a single bug or new feature in one frontend could disable entire production lines.
when there's a reliable, transactional, relational datastore at the bottom [..] you want to be pushing as much as possible down the stack, to reduce the surface area that someone could [..] change data in the wrong way
I'm working on a project that is almost literally your example of reducing a relational database to a key value store, despite client applications having complex problem domains to model.
The consequence has been catastrophic data consistency problems and the whole programme rapidly grinding to a halt.
I think the problem is the overly broad definition of 'business logic' that encompasses everything from data integrity to presentation logic.
I've seen "business logic" refer to both A) how a discount promotion works and B) how to arrange images of the product for a particular set of customers. Obviously it's fine to have A in the database...
thanks for bringing that up. everyone on a team needs to have the same working definitions, and yeah... "business logic" is a term that's used a lot, but I very often see people not really understanding that it's usually meant in opposition to "display logic", not just as an interchangeable term for "any code".
Business logic being in stored procedures is the ideal place for it.
The problem with it is technical: the tooling for migrating and managing schemas and stored procedures is hot garbage, and there aren't good ways to enforce consistency between the invariants of the applications and the database.
Really, it should be possible, on both application deploy and when stored procedures are updated, to import library functions that the stored procedures use from the database and run unit tests against them to ensure that all clients are compatible.
Have done stuff like this where the queries were quite complex and essentially BI style.. Wrote a system to store and manage the procs in VCS and manage them through the ORM migration system(ORMs are quite a la cart).
Your either generating and submitting the queries through prepared statements or just storing them in the database. It's not "leaking" IMHO to do the later and in fact can come with benefits; you can call the procs while connected to the database through various other clients be it command line or a BI system. So in effect you are encapsulating the logic in the place with the most potential for re-use.
With the MS stack it's really easy. You can design the schema and procs, views, etc. in Visual Studio, then have it compare against each environment to create the deployment script.
I like ActiveRecord. I like it so much I've implemented the Active Record pattern myself and compared my results with ActiveRecord.
That said, I've come around to the Repository pattern and quite like the Ecto way of doing things. It's a lot more flexible at the expense of some simplicity. It makes some really hard things easier than AR can.
This but Hibernate + Spring Repositories. Writing SQL is fine but the 95% CRUD cases are much better served with a decently managed ORM. We have an older project here using iBATIS, MSSQL and lots of stored procedures/SQL-first approach. See this tutorial for a preview of the horror to come: https://www.tutorialspoint.com/ibatis/ibatis_create_operatio...
'Employee.xml'
Now imagine this but with stored procedures and inserts with tens of parameters.
I've worked on a system where almost all of the business logic was handled as stored procedures (apart from things that belonged at the UI level, of course). Worked fine, was easy to debug , and performed great. Sure, you had to know how to actually write good SQL, which threw off kids who think programming equals using JavaScript framework du jour, but I don't see it as a shortcoming. No XML anywhere either.
At some point they rewrote it as a web app with NHibernate and all that. Took them literally many hundreds of man-years, and it still runs like molasses. And profiling whatever crap NHibernate emits is rather joyless enterprise.
As a fellow AR lover the only things that consistently bug me are when it does things like:
- .count on a collection calls SQL every time, but there is also a .count method for arrays - which means you need to remember to use .length if you just want to count the objects loaded into memory
- ‘build’ methods for an association usually only initialize a record in memory, but with has_one associations it deletes any record that already exists
So basically when it’s not clear about the fact that it’s doing database things.
There's a .size that works for pretty much everything: if the relation isn't loaded then it will do select count(*), but for arrays and loaded associations it does length of an array