Genuine question: I appreciate the comments about MongoDB being much better than...

rudolph9 · 2025-02-25T01:45:26 1740447926

Don’t choose Mongo. It does everything and nothing well. It’s a weird bastard of a database—easily adopted, yet hard to get rid of. One day, you look in the mirror and ask yourself: why am I forking over hundreds of thousands of dollars for tens of thousands' worth of compute and storage to a company with a great business operation but a terrible engineering operation, continually weighed down by the unachievable business requirement of being everything to everyone?

g7r · 2025-02-25T07:02:22 1740466942

I have experience using both MongoDB and PostgreSQL. While pretty much spoken here is true, there is one more scalability aspect. When a fast moving team builds its service, it tends to not care about scalability. And in PostgreSQL there are much much more features that prevent future scalability. It's so easy to use them when your DB cluster is young and small. It's so easy to wire them up into the service's DNA.

In MongoDB the situation is different. You have to deal with the bare minimum of a database. But in return your data design has much higher horizontal scalability survivability.

In the initial phase of your startup, choose MongoDB. It's easier to start and evolve in earlier stages. And later on, if you feel the need and have resources to scale PostgreSQL, move your data there.

beAbU · 2025-02-24T22:31:59 1740436319

Mongo is Web scale.

leowoo91 · 2025-02-24T23:36:08 1740440168

instagram use postgresql and still web-scale (unless this was satire)

terafo · 2025-02-24T23:40:15 1740440415

https://www.youtube.com/watch?v=b2F-DItXtZs

plasma_beam · 2025-02-25T02:13:22 1740449602

I have not watched that video since 2013 (wow!) and it is still hilarious.

riku_iki · 2025-02-25T18:17:41 1740507461

they obviously didn't use vanilla postgres, but built some custom sharding on top, which is untrivial task (implementation and maintenance(resharding, failover, replication, etc)).

koakuma-chan · 2025-02-24T22:38:08 1740436688

Choose Mongo if you need web scale.

threeseed · 2025-02-24T22:06:40 1740434800

a) MongoDB has built-in, supported, proven scalability and high availability features. PostgreSQL does not. If it wasn't for cloud offerings like AWS Aurora providing them no company would even bother with PostgreSQL at all. It's 2025 these features are not-negotiable for most use cases.

b) MongoDB does one thing well. JSON documents. If your domain model is built around that then nothing is faster. Seriously nothing. You can do tuple updates on complex structures at speeds that cripple PostgreSQL in seconds.

c) Nobody who is architecting systems ever thinks this way. It is never MongoDB or PostgreSQL. They specialise in different things and have different strengths. It is far more common to see both deployed.

delusional · 2025-02-24T22:29:02 1740436142

> It's 2025 these features are not-negotiable for most use cases.

Excuse me? I do enterprise apps, along with most of the developers I know. We run like 100 transactions per second and can easily survive hours of planned downtime.

It's 2025, computers are really fast. I barely need a database, but ACID makes transaction processing so much easier.

redwood · 2025-02-24T23:19:34 1740439174

MongoDB has had ACID transactions for many years. I encourage folks to at least read up on the topic they are claiming to have expertise in

theamk · 2025-02-25T18:06:36 1740506796

They failed every single Jepsen test, including the last one [0]

granted, the failures were pretty minor, especially compared to previous reports (like the first one [1], that was a fun read), but they still had bad defaults back then (and maybe still do)

I would not trust anything MongoDB says without independent confirmation

[0] https://jepsen.io/analyses/mongodb-4.2.6

[1] https://aphyr.com/posts/284-call-me-maybe-mongodb

Tanjreeve · 2025-02-25T07:36:41 1740469001

Reputation matters. If someone comes to market with a shoddy product or missing features/slideware then it's a self created problem that people don't check the product release logs every week for the next few years waiting for them rectifying it. And even once there is an announcement people are perfectly entitled to have scepticism that it isn't a smoke and mirrors feature and not spend hours doing their own due diligence. Again self created problem.

winrid · 2025-02-25T10:20:44 1740478844

Last I checked they still didn't even implement pagination on their blog properly

touche_bag · 2025-02-25T11:12:11 1740481931

100? I had a customer with 10k upserts incl merge logic for the upserts while serving 100k concurrent reads. Good luck doing that with a SQL database trying to check constraints across 10 tables. This is what Nosql databases are optimized for... There's some stand-out examples of companies scaling even mysql to ridiculous sizes. But generally speaking, relational databases don't do a great job at synchronous/transactional replication and scalability. That's the trade off you make for having schema checks and whatnot in place.

delusional · 2025-02-25T17:32:26 1740504746

I guess I didn't make myself clear. The number was supposed to be trivially low. The point was that "high performance" is like the least important factor when deciding on technology in my context.

scosman · 2025-02-24T22:25:46 1740435946

A) Postgres easily scales to billions of rows without breaking a sweat. After that shard. It’s definitely negotiable.

threeseed · 2025-02-24T22:29:31 1740436171

So does a text file.

Statements like yours are meaningless when you aren't specific about the operations, schema, access patterns etc.

If you have a single server, relational use case then PostgreSQL is great. But like all technology it's not great at everything.

scosman · 2025-02-24T22:42:56 1740436976

The use a text file.

In all seriousness, calling Postgres’ scalability “not-negotiable for most use cases” is wild.

threeseed · 2025-02-24T23:10:34 1740438634

What's wild is you misrepresenting what I said which was:

"built-in, supported, proven scalability and high availability"

PostgreSQL does not have any of this. It's only good for a single server instance which isn't really enough in a cloud world where instances are largely ephemeral.

tristan957 · 2025-02-25T00:22:09 1740442929

Do you mean ephemeral clients or Postgres servers?

g8oz · 2025-02-24T23:58:14 1740441494

If multiple nodes are needed, then why MongoDB and not a Postgres compatible distributed product like CockroachDB or YugabyteDB?

jamesrr39 · 2025-02-25T21:49:37 1740520177

Thanks for these comments, I appreciate it.

Although I would point out:

> scalability [...] no company would even bother with PostgreSQL at all

In my experience, you can get pretty far with Postgresql on a beefy server, and when combined with monitoring, pg_stat_statements and application level caching (e.g. the user for this given request, instead of fetching that data on every layer of the request handling), certainly enough most businesses/organisations out there.

jeremycarter · 2025-02-24T22:10:38 1740435038

Great response. All arguments are valid and fair.

riku_iki · 2025-02-24T21:52:21 1740433941

Mongo is real distributed and scalable DB, while postgres is single server DB, so main consideration could be if you need to scale beyond single server.

throw14082020 · 2025-02-24T21:55:26 1740434126

Ahhh, this sounds familiar! https://www.youtube.com/watch?v=b2F-DItXtZs

riku_iki · 2025-02-24T23:32:30 1740439950

things still can be true, even if being wrapped into meme videos by haters..

itake · 2025-02-25T07:06:30 1740467190

Postgres has replicas? Most people use those for reads and a master writes.

Tostino · 2025-02-25T21:15:29 1740518129

This can take you really damn far.

I've been playing with CloudNativePG recently and adding replicas is easy as can be, they automatically sync up and join the cluster without you thinking about it.

Way nicer than the bare-vm ansible setup I used at my last company.

codr7 · 2025-02-24T23:29:54 1740439794

Calling MongoDB a real database compared to PostgreSQL is hilarious.

MongDB is basically a pile of JSON in comparison, no matter how much you distribute and scale it.

riku_iki · 2025-02-24T23:35:10 1740440110

I think there is no distributed db on the market available with features parity to PgSQL. Distributed systems are hard, and sacrifices need to be made.

jwr · 2025-02-25T02:57:32 1740452252

sigh

See https://jepsen.io/analyses for how MongoDB has a tradition of incorrect claims and losing your data.

Distributed databases are not easy. Just saying "it is web scale" doesn't make it so.

riku_iki · 2025-02-25T06:07:52 1740463672

Are you aware:

1. That PgSQL also has issues in jepsen tests?

2. of any distributed DB which doesn't have jepsen issues?

3. It is configurable behavior for MongoDB: can it lose data and work fast, or work slower and do not lose data. There is no issues of unintentional data loss in most recent(5yo) jepsen report for MongoDB.

jwr · 2025-02-25T06:46:44 1740466004

Distributed databases are not easy. You can't simplify everything down to "has issues". Yes, I did read most Jepsen reports in detail, and struggled to understand everything.

Your second point seems to imply that everything has issues, so using MongoDB is fine. But there are various kinds of problems. Take a look at the report for RethinkDB, for example, and compare the issues found there to the MongoDB problems.

riku_iki · 2025-02-25T06:54:52 1740466492

> Take a look at the report for RethinkDB

RethinkDB doesn't support cross document transactions, problem solved lol

theamk · 2025-02-25T18:17:51 1740507471

PgSQL only defect was anomaly in reads which caused transaction results to appear a tiny bit later, and they even mentioned that it is allowed by standards. No data loss of any kind.

MongoDB defects were, let's say, somewhat more severe

[2.4.3] "In this post, we’ll see MongoDB drop a phenomenal amount of data."

[2.6.7] "Mongo’s consistency model is broken by design: not only can “strictly consistent” reads see stale versions of documents, but they can also return garbage data from writes that never should have occurred. [...] almost all write concern levels allow data loss.

[3.6.4] "with MongoDB’s default consistency levels, CC sessions fail to provide the claimed invariants"

[4.2.6] "even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations"

let's not pretend that Mongo is a reliable database please. Fast? likely. But if you value your data, don't use it.

riku_iki · 2025-02-25T20:30:01 1740515401

In attempt to understand your motives in this discussion, I would like to ask question:

* why you are referring on 12yo reports for very early MongoDB version?

jwr · 2025-02-27T01:54:54 1740621294

This discussion refers to the entire history of MongoDB reports, which shows a lack of care about losing data.

If you wish to have a more recent MongoDB report, Jepsen is available for hire, from what I understand.

riku_iki · 2025-02-27T02:07:50 1740622070

No, discussion started with question "Why choose Mongo in 2025?" So, old jepsen reports are irrelevant, and most recent one from 2020 is somehow relevant.

threeseed · 2025-02-24T22:13:15 1740435195

High availability is more important than scalability for most.

On average an AWS availability zone tends to suffer at least one failure a year. Some are disclosed. Many are not. And so that database you are running on a single instance will die.

Question is do you want to do something about it or just suffer the outage.

riku_iki · 2025-02-25T00:42:19 1740444139

I think major providers provide PG service with cross zone availability through replication.

amazingamazing · 2025-02-24T22:30:08 1740436208

It's sad that this was downvoted. It's literally true. MongoDB vs. vanilla Postgres is not in Postgres' favor with respect to horizontal scaling. It's the same situation with Postgres vs. MySQL.

That being said there are plenty of ways to shard Postgres that are free, e.g. Citus. It's also questionable whether many need sharding. You can go a long way with simply a replica.

Postgres also has plenty of its own strengths. For one, you can get a managed solution without being locked into MongoDB the company.

threeseed · 2025-02-24T22:41:14 1740436874

Citus is owned by Microsoft.

And history has not been nice to startups like this continuing their products over the long term.

It's why unless it is built-in and supported it's not feasible for most to depend on it.

amazingamazing · 2025-02-24T22:48:41 1740437321

that's fair, but that's true of mongodb itself too. I wouldn't count that against either of them.

threeseed · 2025-02-24T23:07:50 1740438470

MongoDB makes money selling and supporting MongoDB.

Microsoft does not make money supporting Citus.

999900000999 · 2025-02-24T21:52:21 1740433941

Simple.

Postgres is hard, you have to learn SQL. SQL is hard and mean.

Mongo means we can just dump everyone into a magic box and worry about it later.No tables to create.

But their is little time, we need to ship our CRUD APP NOW! No one on the team knows SQL!

I'm actually using Postgres via Supabase for my current project, but I would probably never use straight up Postgres.

codr7 · 2025-02-24T23:34:28 1740440068

If learning SQL is hard, maybe software isn't the best choice of career.

Writing code and creating good software requires a lot of mental clarity and effort; that fact is never going to change, not even with AI.

999900000999 · 2025-02-25T00:20:31 1740442831

Billions upon billions of value have been created just upon the premise that SQL is hard.

Firebase by and almost every NoSql technology is based upon this.

SEJeff · 2025-02-24T21:58:11 1740434291

Postgres supports JSONB natively. It literally speaks mongo line protocol and you can shove unstructured json into it.

It has supported this since 9.4: https://www.postgresql.org/docs/current/datatype-json.html

999900000999 · 2025-02-24T22:10:38 1740435038

I don't necessarily agree with the above justifications, but in my experience this is basically why teams pick Mongo.

It's easier to get started with.

codr7 · 2025-02-24T23:31:22 1740439882

Now there's a truth about MongoDB, it's easy to get started with.

But why is that the top priority?

FridgeSeal · 2025-02-24T23:34:56 1740440096

Because some devs and teams prioritise “get to prod” above literally all else.

Maintainability? Secondary. Security? Secondary. Data-integrity/correctness? Secondary.

SEJeff · 2025-02-24T23:22:22 1740439342

It’s hard to disagree with you on that part. PG is definitely not free to get starts with and requires a bit of setup (hello pg_hba.conf).

winrid · 2025-02-25T10:22:48 1740478968

Yes but updating nested fields is last write wins, and with mongo you could update two fields separately and have the writes succeed, it's not equivalent.

SEJeff · 2025-02-26T05:49:13 1740548953

Can you provide an example or documentation please?

winrid · 2025-02-26T23:34:02 1740612842

When you write to a postgres jsonb field it updates the entire JSONB content, because that's how postgres's engine works. Mongo allows you to $set two fields on the same document at the same time, for example, and have both writes win, which is very useful and removes distributed locks etc. This is just like updating specific table columns on postgres, but postgres doesn't allow that within columns, you'd have to lock the row for updating to do this safely which is a PITA.

chpatrick · 2025-02-24T21:58:47 1740434327

Even as a JSON document store I'd rather use postgres with a jsonb column.

tiltowait · 2025-02-25T01:46:56 1740448016

Why is that? I found Postgres's JSONB a pill to work with beyond trivial SELECTs, and even those were less ergonomic than Mongo.

chpatrick · 2025-02-25T11:52:10 1740484330

Because you get the convenience of having a document store with a schema defined outside of the DB if you want it, along with the strong guarantees and semantics of SQL.

chpatrick · 2025-02-25T18:58:21 1740509901

For example: let's say you had a CRM. You want to use foreign keys, transactions, all the classic SQL stuff to manage who can edit a post, when it was made, and other important metadata. But the hierarchical stuff representing the actual post is stored in JSON and interpreted by the backend.

cryptonector · 2025-02-25T01:00:02 1740445202

I thought this was sarcasm till the last sentence. Now I'm not sure.