Since everyone is sharing their opinion and experience with mongodb I think I’ll share mine.
As an appeal to authority I would like to mention that I have relevant vocational qualifications on the subject (more geared towards scalability and operations). Although I don’t believe it really matters - it will to those who assume I don’t understand best practice.
MongoDB itself is not /really/ a valid choice in many scenarios that it was painted as solving. Their only fault is overzealous marketing, it has (in my opinion) very clear pain points that should be avoided, but those painpoints are antithical to why many people used it in the first place.
Most people pick up mongo because it’s painted as being “beginner developer friendly”, I don’t mean new developers, I mean picking it up and running with it, without understanding it, was made to be incredibly easy. But MongoDB itself needs you to understand your data patterns before you start adding shards, so the technology itself depends on you actually sitting down and designing an architecture while understanding that. These goals are at odds with each other.
In MongoDB (as it was when I was using it in full prod 6+ years ago) you -needed- to understand how your data is going to grow and how it will be queried long before you ever created an index. You could not grow after creation. But using it as a plain document store with no searching and heavy sharding on the document ID is the best way to go. And in that scenario it is much better than most competitors.
In nearly every /other/ scenario its a less favourable choice than another technology of some variety.
I would argue the data loss point but I think if that’s not a solved issue it will be, and I’m fairly certain you can configure it to be slower but correct (my memory is bad).
I am not a MongoDB advocate, nor do I hate the technology outright. I strongly dislike how it was marketed as being a panacea.
And for the same reason I avoid PHP, I will attempt to avoid MongoDB.
(As in; it can be done well but the majority of cases will be poorly implemented)
I’m a big fan of Mongo for the use case you described - searching by ID and all information in one document.
But people don’t seem to understand that there are plenty of scenarios where you really either don’t know the schemes in advance and/or the “schema” is defined by an external source.
I worked for a company that sold software that allowed users to create forms that could be filled out either on the web or via a mobile app.
The user created the form and the schema and the indexes were created on the fly - one collection per type of form. What would an RDMS have bought us?
And if you create a table with a single ID column and a single JSON column you’ve essentially re-invented a NoSQL database. But I guess you can pretend it isn’t.
I've been using this in production for around 1 year - it's an absolute dream to use!
For context, I've previous experience with NHibernate, EF, EF Core, Dapper and some others from yesteryear - Marten is probably the best dev experience I've had from an ORM.
And then what happens when they add a field to the form and the table already has a million rows? What happens when they decide that the numeric field should have strings?
It would probably work using a forms table, fields table, submissions table, and values table.
I didn’t ask “would it have worked”, I asked “what would have bought us”.
Alter table is generally no big deal for any of the use cases that MongoDB is also able to handle.
On any good RDBMS, adding a nullable column to an existing table is an O(1) operation. This is the only option that's comparable to what's available in MongoDB, and it has the same performance characteristics.
On the great ones, adding a non-nullable column with a default value to an existing table is also an O(1) operation. The good-but-not-great ones, it's also O(N). (As always, you get what you pay for.) For MongoDB, wanting to do this would be unusual, but you would have the option of back-filling every record. It would be an O(N) operation, too. So, for this case, the characteristics of the RDBMS are no worse, and possibly better.
Adding a non-nullable column with no default is always O(N), but the fact that you're suggesting a document store as an alternative implies even more strongly that this is not the use case you're trying to cover. That said, if you did do it, it would also be O(N).
Converting a numeric column to a string column is always going to be O(N), yes. Whether or not that's the better option is something that's got to be decided in context. Basically, do you want to pay the cost of datatype conversion in one lump sum and then be done with it forevermore, or do you want to pay a small fee for datatype coalescing every time you access that field? There are good reasons to choose both options. However, all too often, the 2nd option is chosen for a very bad reason: Simply assuming that it's zero cost.
What happens when you need to do something like "Select browser user agent from all users who filled forms for a particular set of clients after a given date." ?
This would fit in a single SQL query which is expected to perform reasonably well, with an unstructured database optimizing this query will take months of work.
Something like this? I'm not sure why this couldn't be an optimized query in mongo, but I'm also not sure why a query like this one needs to be optimized? This would run fast enough without needing indexes, and really fast with an index on a couple fields, but is a query like this run so frequently you need to have it be extremely optimized?
How so? In our case, meta data like the userid, browser agent, date entered, etc was always added to the object before it was stored and those fields were indexed. They are just name value pairs.
The point is that the query I mentioned requires joins.
You can of course get the same information from key value pairs, it will just require a number of scans over all your data, which doesn't scale if you need the queries to be fast.
On the RDBMS side, there has been more than three decades of research on optimizing patterns like this. You don't want to try and reinvent that.
If you can know for sure from the start that you'll never need queries like this, then of course something like Mongo will be awesome. But requirements change, hence this article.
You saw the part where I said that all the forms had different schemas and were in different collections? The RDMS equivalent would be all of the different types of forms would be in different tables and each user would have their own database. You would still have the same issue where you would have to query the database’s metadata to get all of the tables and programmatically join the data.
At another company where I worked where we used Postgres, we had a multitenant set up where each of our (large) customers had their own database. The issue would have been the same.
You would no more “scan over all of your data” with Mongo with indexed fields than you would with an RDMS with indexes.
Yes Mongo supports joins. But, I wouldn’t use them. Application servers scale much easier than database servers. You’re not getting any efficiency gains from doing server side joins over just reading documents from the left side and doing an “in” query with the ids from the right side. Assuming you are doing the equivalent of a left outer join.
In fact, if you are using C#. You could use the same LINQ syntax either way.
I work a lot with documents in my current role which includes a lot of JSON structures as well.
MongoDB has been immensely useful for a team with limited scope (and requisitional abilities within the organization) to get up and running and store backups of documents that have been processed and JSON API responses.
I definitely wouldn’t apply it as a panacea, either.
Like any tool, it has it’s place in the belt for me. It’s no universal hammer, though.
As an appeal to authority I would like to mention that I have relevant vocational qualifications on the subject (more geared towards scalability and operations). Although I don’t believe it really matters - it will to those who assume I don’t understand best practice.
MongoDB itself is not /really/ a valid choice in many scenarios that it was painted as solving. Their only fault is overzealous marketing, it has (in my opinion) very clear pain points that should be avoided, but those painpoints are antithical to why many people used it in the first place.
Most people pick up mongo because it’s painted as being “beginner developer friendly”, I don’t mean new developers, I mean picking it up and running with it, without understanding it, was made to be incredibly easy. But MongoDB itself needs you to understand your data patterns before you start adding shards, so the technology itself depends on you actually sitting down and designing an architecture while understanding that. These goals are at odds with each other.
In MongoDB (as it was when I was using it in full prod 6+ years ago) you -needed- to understand how your data is going to grow and how it will be queried long before you ever created an index. You could not grow after creation. But using it as a plain document store with no searching and heavy sharding on the document ID is the best way to go. And in that scenario it is much better than most competitors.
In nearly every /other/ scenario its a less favourable choice than another technology of some variety.
I would argue the data loss point but I think if that’s not a solved issue it will be, and I’m fairly certain you can configure it to be slower but correct (my memory is bad).
I am not a MongoDB advocate, nor do I hate the technology outright. I strongly dislike how it was marketed as being a panacea.
And for the same reason I avoid PHP, I will attempt to avoid MongoDB.
(As in; it can be done well but the majority of cases will be poorly implemented)