Is it just me that thinks in general schema design and development is stuck in t...

sjwhevvvvvsj · on Feb 17, 2024

One thing I like about hand designing schema is it makes you sit down and make very clear choices about what your data is, how it interrelates, and how you’ll use it. You understand your own goals more clearly.

doctor_eval · on Feb 18, 2024

So many people I encounter seem to think it’s the code that’s important when building the back end of an application. You see this when people discussing database schemas start comparing, say, rails to hibernate. But ORMs emphasise code instead of data, which in my experience is a big mistake.

In my experience, getting the data structures right is 99% of the battle. If you get that right, the code that follows is simple and obvious.

For database applications, this means getting the schema right. To this end, I always start with the underlying table structures, and only start coding once I understand how the various tables are going to interact.

Sadly, too many people think of the database as the annoying hoops we jump through in order to store the results of our code. In my world, the code I write is the minimum required to safely manipulate the database; it’s the data that counts.

Some people seem to think I’m weird for starting with the database (and for using plpgsql), but I think it’s actually a superpower.

jupp0r · on Feb 18, 2024

This is true for in memory data as well. Object oriented programming is great for some problems, but it's also limiting the way we think about data by putting it close to the code operating on it. ORMs do the same to databases by pretending that rows are objects when that's only one way of modeling your problem.

jimbokun · on Feb 18, 2024

It’s part of the dirty little secret of why document databases and other NoSQL systems became popular.

Required even less up front thinking about how to model your data. Throw some blobs of JSON into Mongo or whatever, and worry about the rest later.

thom · on Feb 18, 2024

It wasn’t always code first - you mention Hibernate but 15-20 years ago it was entirely feasible to inherit a database schema or design one up front, and then create performant metadata mappings to a usable object model. That sort of practise was tainted by the Bad Enterprise brushes of verbosity and XML in general, and so everyone moved to some flavour of active record. This allowed programmers to write less code and fit it into neater boxes, at a time when there was enormous demand for basic CRUD web apps, but a lot of power and expressiveness was lost.

Somewhat ironically, many modern enterprises have peeled all the way back to SQL for a huge amount of logic anyway, so I don’t think we’re done caring about database schemas quite yet.

doctor_eval · on Feb 18, 2024

You are absolutely correct; my previous business ended up ejecting almost all database logic from Java ORMs, and moved almost everything to straight SQL manipulated by stored procedures. Doing so resulted in a significant performance and productivity increase, relative to all the layers of nonsense we used to jump through.

One of the difficulties of doing this was that the tooling isn’t great. We had to write our own tools to make it work, but the benefits of going back to SQL were more than worth it.

(Many years later I made an open source version of that tooling, https://github.com/pgpkg/pgpkg)

sjwhevvvvvsj · on Feb 18, 2024

A simple function call as a wrapper for a well optimized sql query just can’t be beat for performance. I have never understood why anybody would use an ORM, it’s usually as much work to learn and maintain them as SQL.

doctor_eval · on Feb 19, 2024

Yes, agreed. I spent a few years away from the technical side of things and when I jumped back in I was horrified at the amount of complexity these things added.

In Java it appeared that JPA was trying to completely abstract the database into the JVM, but it just meant a million annoying cache problems, terrible performance, a serious cognitive load, and poor interoperability with other database users.

The guy who promoted this internally kind of wrote the problems off as being a "leaky abstraction" but it wasn't even that. JPA entirely failed to resolve the object-relational "impedance mismatch" by failing to acknowledge that the boundary between the database and the application is actually a legitimate surface, like any other API surface.

What a nightmare it was.

sjwhevvvvvsj · on Feb 18, 2024

Yup - and you can’t code your way to real scale either. At real scale the game is all about data structures. Code just gets them from A to B.

Or as they say at Google, the job of SWE is “moving protos”.

nly · on Feb 18, 2024

Exactly that. Sitting down and thinking about your data structures and APIs before you start writing code seems to be a fading skill.

EvanAnderson · on Feb 18, 2024

It absolutely shows in the final product, too.

I wish more companies evaluated the suitability of software based on reviewing the back-end data storage schema. A lot of sins can be hidden in the application layer but many become glaringly obvious when you look at how the data is represented and stored.

timacles · on Feb 18, 2024

Theres no right abstraction for it because everyones data is different. From my experience what most developers dont realize is that data is more complex than code. Code is merely the stuff that sits on top of the data, shuffling it around... but designing and handling the data in an efficient way is the real engineering problem.

Any abstraction you could come up with wouldnt fit 90% of the other cases

Atotalnoob · on Feb 18, 2024

EF core doesn’t to drop/create for columns in db providers that support renaming columns. It only does it for ones that don’t like MySQL or SQLite

winrid · on Feb 18, 2024

Not a silver bullet for every project but the Django ORM largely solves this with its migrations. You define your table classes and it just generates the migrations.

Throw in a type checker and you're in pretty good shape.

Rust also has sqlx which will type check your code against the DB.

dxdm · on Feb 18, 2024

I'm assuming this is why you say it's not a silver bullet, but to make it more explicit: the Django ORM will happily generate migrations that will lock crucial tables for long amounts of time and bring down your production application in the process.

You still need to know what SQL the migration will run (take a look at `manage.py sqlmigrate`) and most importantly how your database will apply it.

basil-rash · on Feb 18, 2024

Dealing with a bunch of automigrate headaches in the Prisma ORM convinced me to just drop the layer entirely and write plain old SQL everywhere. It’s forced me to learn a bunch of new stuff, but the app runs faster now that I can optimize every query and migrations are much simpler with a single idempotent SQL setup script I can run to provision whatever deployment of the DB I need. I’m sure some problem spaces might benefit from all the additional complexity and abstraction, but the average app certainly can make do without for a long time.

winrid · on Feb 18, 2024

It's a tradeoff! I think using the ORM to start and then move off it later is valid, depending on how much time you have to get an MVP out.

kfajdsl · on Feb 18, 2024

I also switched to plain SQL migrations and queries because I find it much simpler. I hear this a lot, that ORMs are easier/quicker to use, but I've found that writing plain SQL has almost no friction for me. I mean, learning at least basic SQL is part of the first year in most CS degrees.

fabian2k · on Feb 18, 2024

There are some things that are really annoying in raw SQL and that are much easier with ORMs. Dynamic queries are one of those, if you don't want to concatenate SQL strings yourself you need at least a query builder at that point. The other part is updating/creating entities with many relations, that gets very tedious quickly and is somethings ORMs can handle for you.

It depends a lot on the specific application, for me those things are so common that I prefer to use an ORM even though I could write the SQL myself. ORMs are easier to use, but you still need to learn what they do under the hood and understand them or you will run into issues along the way.

basil-rash · on Feb 18, 2024

I’m not suggesting anyone implement the raw binary protocol themselves! The postgres connection library you’re using should be able to handle dynamic queries perfectly fine.

mvdtnz · on Feb 18, 2024

An ORM is NEVER the solution, ever ever ever. Repeat after me: ORMs are not the solution to this problem. They work in your little toy apps with 4 customers but they are nothing but pain on real enterprise grade software.

winrid · on Feb 18, 2024

also, every company I've worked at used ORMs in some capacity. Sometimes, not always.

Also, I don't really work on any apps with only four customers. either they have almost a million or zero :P Try again. :)

winrid · on Feb 18, 2024

I don't think insults are really necessary here.

Also ORMs can be very useful, just don't do dumb stuff, like with any technology.

I use them when appropriate.

cschmatzler · on Feb 17, 2024

If you use MySQL, Planetscale’s branching is really amazing. Not using them, but wish I could for that. Gives you a complete diff of what you’re doing, and can also pre-plan migrations and only apply them when you need with their gating.

Merad · on Feb 18, 2024

> Also the point about external IDs and internal IDs - why can't the database/ORM do this more automatically?

It has pretty big implications for how your application code interacts with the database. Queries that involve id's will need to perform joins in order to check the external id. Inserts or updates that need to set a foreign key need to perform an extra lookup to map the external id to the correct FK value (whether it's literally a separate query or a CTE/subquery). Those are things that are way outside the realm of what EF can handle automatically, at least as it exists today.

emodendroket · on Feb 18, 2024

I think that stuff works about as well as it possibly could. If you think that's painful think about something like DynamoDB where if you didn't really think through the access patterns up front you're in for a world of pain.