>Relational databases are very rigid so you need to really make sure you understand your data upfront.
Is NOT true:
CREATE TABLE/VIEW ...
ALTER TABLE ...
SELECT (whatever) ...
Changing a rdbms at runtime is fully supported, in MUCH better ways than almost all "nosql" databases.
> Changing the database is a big effort
Is much less effort than nosql. With more guaranties than the change is CORRECT, and not left you half documents in one version and the other in other version.
> These databases are hard to distribute and scale so you tend to end up with single instances that are very large and require replication for backup. The cloud providers have relational databases today that are supposed to scale, but I have no experience with them.
Is very easy to be "web scale" if you don't care about data integrity. Very few nosql are half good about this. Scaling a rdbms is very similar to have a less brittle nosql product (like mongo), is about how you model the data.
I was in a team behind one of the largest deployments on google cloud store. The amount of coding for cover for the store lack of integrity was a big part of the codebase. Despite the supposedly massive amount of data I don't see why have a rdbms in the back could have been worse (when anyway you must put in front redis and other caches for high ingestion).
I think for the majority of the startup deployments, rarely a non-rdbms make much sense, specially if the team have not idea of what truly a good rdbms can do, or how code by hand a correct transaction code...
I'd argue relational DBs get way more credit than they're worth. And it is indeed tricky to make them work across different data models -- that's by now an established fact across big SV companies. Restricting direct access to rigid RDBMs and making data modeling easier via graphs is why Facebook Tao, Dropbox Edgestore, Airbnb Knowledge Graph and so on exist. See my post [1].
I'll give you a simple example. Consider, movies and directors. If you started with the fact that each movie has exactly one director, you'd design exactly one table, with a director column.
Later, if you realize that each movie can have multiple directors, you can't just alter that table. You need two new tables -- one for directors, and another one to connect movies to directors. That's not simple, that's a big change which requires a bunch of work.
Doing this in graphs is simple. Adding a bunch more outward relationships is simple. Data modeling, in general, is way simpler in graphs than in RDBMs.
Mongo and such came out of the Bigtable era. Cockroach and Dgraph are coming out of the Spanner era -- where it's not only about horizontal scalability, but also about strong data integrity and distributed ACID transactions.
> Later, if you realize that each movie can have multiple directors
This still require change across the codebase, where the db migration is fairly, the most simply step.
But:
Is similar to "static VS dynamic" where the change in rdbms come BEFORE the new data and in most nosql AFTER.
If somehow the data come in a "progressive do changes pls" then is clear the way most db schemas are, is harder with BEFORE.
So, yeah, exist places that a specialized store make more sense. But just say "is harder" invite this critique, and worse, in the minds of inexperience developers (that are the ones that deploy blind) could lead to "get out of a rdbms, are old and bad!", when in fact is much better, most of time.
ALSO:
> Restricting direct access to rigid RDBMs and making data modeling easier via graphs is why....
The relational model could handle that. BUT, the implementation inside rdbms is just a subset of the full potential. One example: You can't embed tables in tables.
And relational/sql engine have weak support for hierarchies and graph, despite the fact you see sql on top of other non-relational engines.
I'm building a relational lang in the spare time, and without the constrain of fit the(traditional) rdbms is clear the db guys need to up their game!
> That's not simple, that's a big change which requires a bunch of work.
I feel like many of these criticisms of RDBMS come from a perspective of not working with a great database migrations system.
I've made changes like this (one-to-many switched for many-to-many) using Django migrations many times over, and I find it easy. But that's because I've built up the experience with the tooling over time.
Learning migration tooling in this way feels like a much smaller lift to me than entirely switching to a graph database.
(I'm a big fan of Dgraph, but I've not yet found the right project to apply it. I'm not interested in it as a replacement for RDBMS, I want to use it to solve graph problems that are genuinely painful with a relational database. Easier schema changes aren't that for me.)
Totally see your perspective. You're already familiar with data migration, but not familiar with graph DBs. The former is just easier to do than to invest in the learning curve involved with the latter. Valid point.
My take is more root-cause and principle based. If we were to think of a different way to build a DB, then avoiding a data migration altogether is a better approach. Avoiding the downtime, extra work, extra code and so on is just better for the user. Moreover, simpler data models, avoiding data duplication and such become very convincing features.
It's also not so interesting to me when graph DBs are only applied to specific graph problems, like Page Rank and such. We didn't build Dgraph just to solve Page Rank. The reason we built Dgraph was because we saw the complex data modeling problem Google Knowledge Graph threw at us. Graph system allowed that problem to be solved in a really clean and simple way. And it's the same problem we see other companies dealing with as well.
From a user perspective, what we find the graph dynamism is more important for Self-Serve Analytics, like a fast-moving data science or analyst or dev team, than for DBAs.
So a diff perspective:
DBA SCHEMA CHANGE
- Relational: Admins alter the schema and devs maybe run a migration
- Graph: Projects start untyped, but almost all Graphistry has helped are effectively & ultimately typed & locked. So ends up working same as relational. (Ex: ProjectDomino.org took just maybe 1mo before the neo4j DBA locked the schema to avoid inadvertent value pollution.)
DYNAMIC QUERIES
Once a bunch of entity types and relationships get involved, you build up a zoo of tables. This is painful for adding data ("new dataset X needs schema Y, or subtypes into existing table Z by dynamic type field A + generic data stuffed into field metadata"), and much worse, given some data, trying to figure out how it indexes across 20+ tables. Three examples: an Infosec Jupyterthon talk was about how to make sense of OSQuery's ~100 different tables for OS logs, a genetics friend got onboarded to a DB with 30 tables & views they know nothing about, and in Project Domino, you can do all sorts of things with URLs and Topics across our datasets & analyses.
Relational is great for a rigid CRUD app. But once you hit analytics, you get swamped with data integration tasks (discovery, linking, fusion, ...), and tables become a PITA. Formally, graph queries like ("everything 1 step out") gives you genericity in quantification where SQL forces you to know the exact table name & field name ahead of time.
In your movie+director example, changing a relational database is relatively easy. Yes, you create two new tables but that is not hard. And you relocate the director data from the movie table into the new director table, also not hard.
The hard part is updating all the application code that was written assuming that a movie has one director. I'm not super familiar with graph DBs, but I don't see how they could possibly help with that.
Add some producers, cinematographers, actors, oscar awards and so on. Complexity adds up.
OTOH, In Dgraph GraphQL, you'd just do a simple schema edit from:
type Movie {
...
director: Director
}
to
type Movie {
...
directors: [Director]
}
You're right about application code stuff. But, that's a common cost across both the spaces -- though, I'd argue that JSON responses back from GraphQL (/ Dgraph) make it easier.
I'm a massive proponent of the graph model, but the the RDBMS fans have it right here: the schema change is trivial in both cases; the hard part is managing the roll out of those changes to prevent outage on the application you are actually selling.
> I'll give you a simple example. Consider, movies and directors. If you started with the fact that each movie has exactly one director, you'd design exactly one table, with a director column.
With respect, if you have that little domain knowledge, you shouldn't be designing--er, coding--anything other than a throwaway exploratory scrap.
And in preference to doing that, you should be getting out and talking to the people who plan to use your database, and learning how they think, and what about.
Systems analysis is another '80s development "fad" that is way underrated.
GraphQL seems to sit between the strict relational model and the more free-form approach of NoSQL databases (from I can see, but correct me if I am wrong).
I have not used a GraphQL database, but one of the things I fear about designing a relational database is mentioned in the article:
>"Relational databases are very rigid so you need to really make sure you understand your data upfront. Changing the database is a big effort..." (emphasis is mine)
Given how often requirements change in a business, it seems impossible to anticipate all future requirements at the outset of designing a database. Is it true that changing a (relational) database schema is a big effort? Is this a strong case for GraphQL?
GraphQL is not a database technology. It's a Query Language that can sit in front of REST APIs, databases of all kinds, or in-memory data. As such it typically has no impact on your database schema or any difficulties associated with it.
It can be true that changing a relational db schema is a big effort and it can be true that graph (not GraphQL) dbs are more flexible. Even for neo4j and JanusGraph, which don't offer GraphQL by default. They're NoSQL databases that store graphs instead of documents or k/v pairs. It just happens that Dgraph decided to integrate it
It doesn't have to be, but tooling and initial design matters a lot. I've done a lot of significant migrations this year (changing schemas of hundreds of millions of rows in postgres) in a downtime-free, non-breaking way, but there were two big pieces:
- separate internal schemas from external ones, so you can change the internals while maintaining your interface contract. Your database has an API whether you realize it or not, so treat it like one. I literally have schemas like "api" and "api_v2".
- use a great migration tool. I've been using sqitch, which I love because it encourages me to think about how to _verify_ each migration does the correct thing. Tie that into CI/CD in a nice way and DB migrations are no longer scary
That probably downplays the effort; as someone who's never done DBA work before, I've had to learn a lot about PL/pgSQL, transaction isolation levels, role management, etc. for the first time. I think I'm better for it, though, and I've quite enjoyed going beyond query-writing.
As always, it depends. When requirements change, there are a few things that change altogether - and only some of them should affect the database:
1. The data at rest (in your relational database) - database migrations are the solution to this problem, in one way or another. Migrations are well understood, and are not hard - just tricky to get right depending on the complexity of the change.
2. Rules about how to turn that persisted data into something you can reason about at runtime (aka hydration logic)
3. Rules about how to change data from one form or another due to business rules (aka business logic)
If your database is an implementation detail and not the heart of how you reason about your system, then the difficulty in making the migration from one form of data at rest to another should be easier. If you rely on your database for storage and business logic, then you may have a more difficult time.
There're a bunch of changes you can do in Dgraph, that're very convenient. For e.g., going from 1:1 relationship to 1:many relationships (see my comment about movies and directors). Having sparse data models and such.
Then, able to change from int to float, to date -- all these constitute a flexible schema that something like Dgraph is built around, and an relational DB is not -- which all make it easier to iterate upon data models, way better than what relational tables can give.
Is NOT true:
Changing a rdbms at runtime is fully supported, in MUCH better ways than almost all "nosql" databases.> Changing the database is a big effort
Is much less effort than nosql. With more guaranties than the change is CORRECT, and not left you half documents in one version and the other in other version.
> These databases are hard to distribute and scale so you tend to end up with single instances that are very large and require replication for backup. The cloud providers have relational databases today that are supposed to scale, but I have no experience with them.
Is very easy to be "web scale" if you don't care about data integrity. Very few nosql are half good about this. Scaling a rdbms is very similar to have a less brittle nosql product (like mongo), is about how you model the data.
I was in a team behind one of the largest deployments on google cloud store. The amount of coding for cover for the store lack of integrity was a big part of the codebase. Despite the supposedly massive amount of data I don't see why have a rdbms in the back could have been worse (when anyway you must put in front redis and other caches for high ingestion).
I think for the majority of the startup deployments, rarely a non-rdbms make much sense, specially if the team have not idea of what truly a good rdbms can do, or how code by hand a correct transaction code...