The genius and folly of MongoDB

rdtsc · on Oct 20, 2013

The problem with MongoDB is their shadiness. The shipped with unacknowledged writes up until not too long ago. In other words you would write to it and there wouldn't be an ok or fail response, you'd just sort of hoped it would go in.

They fixed that problem but it was too late. In my eyes they proved they are not to be trusted with data.

Had they called themselves MangoCache or MongoProbabilisticStorage, fine, can silently drop writes, I don't care it is not database. But telling people they are a "database" and then tweaking their default to look good in stupid little benchmarks, and telling people they are webscale, sealed the deal for me. Never looking at that product again.

chaffneue · on Oct 20, 2013

And it's not even good or recommended as a cache at any kind of profile. So I guess their most valuable niche is low traffic/prototype sites with poor architecture discipline or genuinely unrelated data sets. There are so much better tools for caching (memcached/redis), durable persistent storage (postgres), session storage (memcached/redis/browser hybrids) and document storage (postgres). Mongo is just one of those brands that quickly solved a problem that most of the better technologies missed - the user interface.

tracker1 · on Oct 21, 2013

I think being able to have flexible data storage with indexing is where they are better than most other options. There's something to be said for some of what they do offer. I was able to replace the search system for a site that used SQL to MongoDB, which often includes geolocation, it works fairly well, I had considered using a ElasticSearch, or something similar, Mongo was a better fit.

Today, I would be inclined to use PostgreSQL with JSON support, and some triggers to update an aggregate search table, or look more seriously towards RethinkDB.

With any NoSQL system you give up something.. you just need to be aware of what you are giving up, why and for what gains.

functional_test · on Oct 20, 2013

I understand some of the reasons people didn't like Mongo, but this always vexed me. The default write level was very clearly documented and you could always change it as necessary. Surely it would be necessary to read the documentation of a database before rolling it out to production?

rdtsc · on Oct 21, 2013

> Surely it would be necessary to read the documentation of a database before rolling it out to production?

You buy a car. It comes with brakes disabled because for whatever reasons that also lets it get to a higher top speed. You are expected to read you car owner manual and on page 54 you find that you have to hold "enable brakes" button under the console for 10 seconds to turn on your brakes. Would it vex you that people might be slightly critical of that car. Clearly they are silly for not reading their car manual until page 54.

That "feature" is not something that should be discovered by reading docs or when you get a crash and then load a backup from another week and still get a crash and then you start hitting your head on your desk.

Anything calling itself a "database" should not have shipped with those default settings _ever_. If they did they might have gotten away with it in my book by having a big flashing red warning on the front or download page. I don't remember one.

functional_test · on Oct 21, 2013

I would also read the manual of a car I just bought before driving it. I guess that's just my style.

Don't get me wrong, I'm not saying your assumption is unreasonable. But in the end, it's on you as a conscientious developer to read the documentation. I'm not even suggesting cover to cover - in this case though they are very up front about write concerns. There is no real excuse to find this out any other way, it's just negligence.

clwk · on Oct 21, 2013

Out of curiosity, have you ever bought a car, and did you actually read the whole manual before driving it?

It's a nice hypothetical, and that might be your style. Most real-world car purchase scenarios I'm familiar with would make that style impractical.

functional_test · on Oct 21, 2013

Literally 1 month ago, I bought a car and did exactly this. What's impractical about spending a little time to read?

rdtsc · on Oct 21, 2013

What is impractical is that something like brakes not working is not something you want to find in the manual on page 54. You don't want the car to start without that feature unless you enter a special code, and confirm you know what you are doing.

A database that has default configured that ends up corrupting users' data silently is like buying a car with the brakes disabled.

Well except that in the car case may brakes disabled won't make the car go faster, but in case of MongoDB I remember fans strutting write benchmarks around comparing it to Postgres, Couch and other database and telling how it is webscale. The reason that design decision was made is shady. That was my initial point.

vanadium · on Oct 21, 2013

Your approach is wholly impractical on its face, actually.

So, let me get this straight: You laid down tens of thousands of dollars on a vehicle that you only post-purchase read the manual of, and you're raising this as some sort of standard people should follow?

Honestly, asking the right questions (and test-driving) upfront should be what lands the purchase, and not discovering the folly of purchasing a car with such ass-backwards issues you only discover after the fact when you bother to dig out the manual.

You drove it off the lot after you bought it, right? Or did you read the manual in the lot right after signing the papers locking you into the purchase?

functional_test · on Oct 21, 2013

Turns out you can read the manual of a vehicle ahead of time. Turns out you can also test drive and do everything else you said, and we don't need to pretend that it's all mutually exclusive. Stop being a pedant -- me listing every bit of due diligence about my car isn't relevant, so let's stay on topic.

Honestly, how can people on HN actually be this against reading? Especially things that are really important? Sure, don't read the contest rules for your McDonald's monopoly. But if the data for your livelihood depends on something, there's no excuse for not reading the documentation.

revetkn · on Oct 21, 2013

> What's impractical about spending a little time to read?

It's impractical because people live a finite amount of time and this is a terrible use of it

functional_test · on Oct 21, 2013

If understanding your production database is a bad use of your time, then I really don't understand your priorities, but I'm glad you're not on my team.

You don't need to read every word of everything, but some things are worth it. Do you sign contracts without reading them too since it's a "terrible use of" your time?

revetkn · on Oct 22, 2013

> You don't need to read every word of everything, but some things are worth it.

Yes - I am saying in this particular car example, the benefit derived reading the entire manual before purchasing/driving a car is not worth the cost unless your time is worth very little. As others have pointed out, no one is flipping through the manual to check whether the brake pedal actually applies the brakes

But it is a good engineering decision to thoroughly read the docs before jumping into a new datastore like Mongo, I agree. Learning there are things like gigantic global locks and unsafe writes are normally enough to make you say, "hey, I probably shouldn't use this to store production data I actually care about"

Zak · on Oct 21, 2013

That's fairly unusual; modern cars have a fairly standardized user experience such that there aren't many ways for a car to do something surprising and dangerous that's covered in the manual.

I did experience one once though: I discovered that a vehicle had traction control when the system activated during a skid. The computer and I disagreed about the best way to respond, and the surprise did make the situation more dangerous than it could have been.

In both vehicles and databases, the situations in which the product might do something unexpected and dangerous should be clearly documented in their own section of the manual. Databases should say "here are the things that could lead to data corruption or loss". Vehicles should say "here are the situations where the vehicle might disregard or override the driver's control inputs".

grmarcil · on Oct 21, 2013

I'm curious about your desired response during the skid and what the traction control system did differently. Would you mind expanding on that if you remember it clearly?

Zak · on Oct 21, 2013

Sure. The vehicle involved was rear wheel drive and, as loaded probably had a rear-biased mass distribution. It began to oversteer in a corner on a patch of ice. Standard protocol for these situations is to apply a moderate amount of throttle to shift weight to the rear and increase traction there while reducing or reversing steering input. The traction control system counteracted my attempt to increase power, requiring vastly more reverse steering input and interfering with my ability to position the vehicle on the road.

I suspect some people will believe the results wouldn't have been what I expected without the traction control. I can't prove they would have been, but I did grow up and learn to drive in Alaska. Based on my experience, I think I would have done better than the computer did.

Here's somebody describing the basic idea involved while discussing the joys of mildly irresponsible driving on cloverleafs: http://www.scottgood.com/jsg/blog.nsf/d6plinks/SGOD-66RJ2Y

victorhooi · on Oct 21, 2013

Yes, whenever I buy a car, I read the manual. If it's second-hand, I download the manual.

Heck, even when I have a rental car for a single day, I will read the manual. Maybe not every page, but I will skim it for gotchas (and if I have time, the whole thing).

Maybe it's just the engineer in me, but it's what I do.

Come on guys, as computer engineers/programmers/developers/whatever, surely professional pride at least would mean we at least read the README and/or the manual, before putting something into production?

edvinbesic · on Oct 21, 2013

This is a pretty silly argument. The people whom it affects are not people buying cars, it's more like launching a shuttle mission, at which point I would assume you have read the f*ing manual.

CoffeeDregs · on Oct 21, 2013

Grandparent seems to make a valid argument.

    it's more like launching a shuttle mission

Given how much we spend time talking about MVPs, Lean Startup, etc, I think people on this site are trying to avoid launch a shuttle mission. [Often] They're looking at building startups and are looking for both time-tested and new-but-advantage-providing technologies and techniques. At first glance, MongoDB appears to be advantage-providing so people adopted it quickly. They didn't read the manual; they put it in production on a small site and got surprised by the lack of durability.

edvinbesic · on Oct 21, 2013

I agree, but if you reach that point isn't that one of those good problems to have?

codeflo · on Oct 21, 2013

Losing your data is never "I good problem to have". Even for a small startup, such a thing can easily kill all the momentum you've built up.

nasalgoat · on Oct 21, 2013

I assure you, having to spend 75% of your administration time dealing with MongoDB problems, taking you away from other critical tasks, you'll see it's not really a great problem to have.

annnnd · on Oct 21, 2013

I like your analogy, very fitting. To their defense however they are far from the only ones to do that... I lost about 2h worth of production data with HBase in just the same way - fortunately I didn't want to trust it completely anyway and had my own logs of all transactions on filesystem, but it definitely shattered my trust in that DB (not to mention it was a pain to setup and had no secondary indexes).

I use MongoDB now in production and I am happy about it. Not huge dataset by any means so MongoDB fits the bill perfectly. It has a few idiosyncrasies (doesn't release disk space after deleting records - what?!?) and you definitely want to read the manual on settings. But it is incredibly easy to use (documents instead of relational data) and allows me to focus on app development instead of my storage backend.

ddorian43 · on Oct 21, 2013

doesn't hbase persist in the transaction log every update, using append on hdfs ? or did you have a version of hdfs that didn't have append ?

linuxhansl · on Oct 26, 2013

HBase never needed nor used append. "Append" refers to the ability to reopen an existing file from a new client and append data to it. HBase writes only immutable files, which means they are written once then only read. Even the WAL is no exception; it is written once, and then only opened for read when needed for recovery or replication.

HBase needs hflush to make sure that the WAL edits are resident at at least 3 (default) HDFS data node machines.

Not sure how exactly grand parent lost data. Each edit is first written to the WAL then committed to the in memory store. The in memory store is flushed to disk into a new file at a certain size. If a server crashes and had unflushed data in the memory store that part of the data is replayed from the WAL on another server.

See also here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-dura...

annnnd · on Oct 21, 2013

HDFS didn't have append at the time, not sure how it is now. It did have some filesystem journalling though (if I remember correctly), we just didn't know we should turn it on.

Stealth- · on Oct 21, 2013

I believe the write concern default is under the responsibilities of the driver -- not the Mongo Daemon.

justinsb · on Oct 21, 2013

Other databases are forgiving; they are configured "safe", even at the expense of speed. The intention is that you can deploy a small system immediately; if/as you grow you will see that the database is going too slowly. You can _then_ look at the performance/safety dials you can tune and choose appropriate trade-offs.

These are systems designed for the real world, where people don't read the manual until they have to.

When people assume MongoDB was similarly designed with their best interests in mind, that's when things go wrong.

icelancer · on Oct 21, 2013

>When people assume MongoDB was similarly designed with their best interests in mind, that's when things go wrong.

No, I just assume that a database has a similar set of features as other databases have had for decades. Mongo does not; it is clearly the exception - and for possibly nefarious reasons, as well.

functional_test · on Oct 21, 2013

I'm not sure how anyone else could know what my best interests are. There are a lot of real world applications where small amounts of data loss don't matter but latency matters a lot.

Any time I deploy something as critical as a database, I carefully read about what it does and how it works. Not doing so is like signing a contract without reading it.

gbog · on Oct 21, 2013

> a lot of real world applications where small amounts of data loss don't matter but latency matters a lot.

I don't understand this reasoning. We are talking about defaults. Defaults are used by people who did not tweak the settings yet. If I am just starting building a thing, I will have bugs and squeaks and I want to make sure I am not fooled by some unreliable data store. I am not likely to need 100GiB/s throughput, but I am very likely to have to hunt bugs, like "I did click on this <like> button but it did not add to the total likes". And I would really really hate it if after half a day of bug hunting I would realize that my data store just didn't store the thing...

justinsb · on Oct 21, 2013

MongoDB is optimized for Mongo Inc, not for you, not for me.

taspeotis · on Oct 21, 2013

> I understand some of the reasons people didn't like Mongo, but this always vexed me. The default write level ... Surely it would be necessary to read the documentation

I don't have much sympathy for people who can't RTFM but storing data is kind of a thing for databases.

dijit · on Oct 21, 2013

the word "Database" brings forth images of ACID compliance.

if you're not trying for that standard at all, it's false advertising.

adron · on Oct 21, 2013

Mongo was sort of great at first, it's a not bad solution for low traffic sites. It's also great for prototyping. But when anything grows up into high availability, high traffic or anything high.

It's kind of funny, albeit interesting that it took this long for much of the industry to start knocking down the house of cards that is built around the database. However many of the databases that are coming out these days have a lot of the same cultural issue of "hide the issues" and "talk about the cool parts".

I'm expecting any minute now for the anti-schema-less pro-schema movement to rise up...

IMHO, it's all about what you're doing at the time, and making the right decision... albeit it helps if the decisions for a database isn't glossing over the issues as insignificant. :-/

btw - "MongoProbabilisticStorage" is a great name for a product!

dsaber · on Oct 21, 2013

Note that even with the changed default to 'acknowledged', data is not guaranteed to have been written to the journal. So, there is still no full durability in writes (by default) and there is a chance data might be lost (e.g. a mongod instance crashes).

bkanber · on Oct 20, 2013

I posted this further down the thread, but I thought I'd share my thoughts on why I like mongo.

Most people don't like mongo because 10gen gives the impression that mongo is better than it actually is, many people feel that mongo is not reliable enough for at-scale applications. They're right; it's not. But that's ok, because:

Mongo's really great for rapid prototyping. You don't need to worry about updating the schema at the db level, it can store any type of document in any collection without complaining, it's really easy to install and configure, the query language is simple and only takes a couple of minutes to learn, it's pretty fast in most use cases, it's pretty safe in most use cases, and it's easy to create a replica set once your prototype gets usage and starts scaling.

Mongo does everything well up until you reach the level where you need heavy-hitting, at-scale, mission-critical performance and reliability. Most projects out there (99 in 100?) will never reach the level of scale that requires better tools than mongo. And since the rest of it is so easy to use, that makes mongo a great starting point for most projects. You can always switch databases later, but mongo gives you the flexibility to concentrate on more important things in the early stages of a project.

ozataman · on Oct 20, 2013

Application design for me almost always begins with data and data structures. Whether my database has an explicit schema or not, I always have one in mind, documented or otherwise reified in the table-data structures I have in my code. I just don't get why people would want a schema-free database that is in almost every way inferior to the rock-solid power beast that is Postgres. Just use a library with proper migration support so you can propagate changes to your schema rapidly during development. You'll thank us later when you learn a little bit of SQL and start analyzing your data, running circles around the no-sql guys.

Cassandra et. al. are completely different, in that you don't use them because they are more fun to use. You use them despite their awkward, low-level interfaces because you're going to dump billions of data cells into your database from day one with no end in sight and want all the easy scaling/availability features provided.

bkanber · on Oct 20, 2013

Do you always start with the perfect data structure? I find myself adding, removing, and restructuring schema often. Just as you think it's silly to use an "inferior" db during prototyping, I think it's silly to have to jump through hoops -- even minor ones -- while I'm just trying to experiment with a new technology or play with a concept, product design, or pet project. 99 times out of 100, I don't care if my project survives the weekend. Let me use that database I want to use!

> You'll thank us later when you learn a little bit of SQL and start analyzing your data, running circles around the no-sql guys.

That's a little condescending... do you know a single mongo user who doesn't have experience with SQL? Plus, I love the fact that I can literally run javascript against my database. Good for production? Certainly not. But that doesn't mean it's not fun or useful.

Not every project requires such rigor. If that's how you enjoy development, that's great! Very few of my projects put the db layer to the test, and so I'm happy with the balance that mongo gives me. I use it in about 4/5 of my experiments and side projects.

integraton · on Oct 20, 2013

> Do you always start with the perfect data structure? I find myself adding, removing, and restructuring schema often.

Which is why it doesn't make any sense to claim that using MongoDB somehow eliminates needing to migrate your data as it evolves.

berntb · on Oct 21, 2013

That is easiest [cough, imho] solved with adding a version number to stored records. Since data is not in much of a normal form and there won't be that many joins, it generally is easy to handle in code.

Sometimes you have to do update of records with a certain version number.

My opinions, for the record: MongoDB is a tool with some use cases. I'm more of an SQL+Memcache guy, if possible, but not religiously if a good argument is presented (that don't sound like "let's use .*, I want another keyword on my cv").

integraton · on Oct 21, 2013

If you make 12 schema changes in month 1 and then no schema changes for the next year, does it really make sense to keep a month's worth of data in 12 different formats and maintain code to support all of the different versions? Why not just do a simple schema change and/or data migration each time and be done with it?

And since this is supposed to aid in rapid prototyping, how does it do so? It seems to me that it does just the opposite by introducing a significant and totally unnecessary burden.

functional_test · on Oct 21, 2013

Generally I'd only have at most 2 formats at once while you converted the older records to the new format. You're right that there's no sense in keeping around a dozen versions but there are a lot of business cases for having two versions of a schema active at once. For example, if you can't bring down your application to convert everything mid-day and instead want to do an incremental conversion.

berntb · on Oct 21, 2013

As functional_test said. Also note that this e.g. depends on how long lived your data is.

(An update routine can be run at any point with low use like Xmas, etc. This is potentially neat, depending on use statistics.)

I'm not saying this is a common thing, but the lack of joins makes the data a bit more flexible -- this can't be too much, if nothing else because then the Javascript will begin to break.

(I do think there are much more use cases for nosql than as a Memcached with more features. Where an old job used MongoDB wasn't one.)

swanson · on Oct 20, 2013

Couldn't you argue that e.g. Postgres and ActiveRecord give you the same rapid prototyping ability but with an easier (and more established) path towards scalability? It is easy to change your schema with migrations at the beginning of a project - just go edit the original ones and nuke your database. And I don't have to worry about properly configuring write-locks, replica sets, or writing map reduce javascript.

bkanber · on Oct 20, 2013

Of course you could argue that. But so what? Having an easier path towards scalability is nice, but irrelevant for the vast majority of projects; not every project is going to turn into a startup or a real product or even something you work on for more than a few weekends!

The last time you hacked together a blogging engine in Node.js one weekend, were you worried about future scalability, or just playing with new technologies because it's fun?

And while it's pretty easy to do schema migrations, it's not easier than _not_ doing them. And what is it you really want to worry about? Making sure your DB is production ready, or tinkering with Express.js and Backbone?

So because of that, many people use mongo as their de facto database. It's just what I use when I need a persistence layer for anything I build, because I already have a mongo db running for like 30 different defunct projects on my dev server.

And then, by happy accident, one of your side projects turns into a real product, and then mongo handles you really well for the first year or so, just up til the point of having to hire a real devops engineer; at which point you swap out your ORM layer and switch to postgres.

integraton · on Oct 20, 2013

> while it's pretty easy to do schema migrations, it's not easier than _not_ doing them.

Regardless of whether you are dealing with a strict schema or flexible schema, you still have to make changes to how you structure your data as you are prototyping or otherwise iterating on it. MongoDB provides no tangible benefit in this case. If you want to rename a field, then you still need to run an update.

How are ad hoc, manual, historically opaque tweaks to data in any way better than an easily generated and version controlled series of scripts representing a replayable history of changes to the data?

If anything, manual untracked tweaks make "rapid prototyping" more difficult since lots of partially or completely undocumented changes to the structure of the data are harder to revert, replay, reason about, or share with others. It's also more work to do it manually since you need to run the commands in multiple environments, rather than just entering the same command or, more frequently, a shortcut command into a generated file.

ollysb · on Oct 21, 2013

> while it's pretty easy to do schema migrations, it's not easier than _not_ doing them.

I just don't buy this argument, writing and executing migrations is braindead simple and usually takes what, 20 seconds start to finish? Writing the line of code you need for mongo must be about 5 seconds.

edit: I did actually give mongodb a good crack(used on a side-project for 6 months last year) but I found that I actually spent a huge proportion of my time working around things that were missing compared to ActiveRecord. It was a huge net loss for me in terms of productivity.

chris_wot · on Oct 21, 2013

So the argument is that MongoDB is only suitable for weekend hobby projects? Now that's quite a remarkable argument!

empthought · on Oct 20, 2013

> postgre.

Just so you know, that's not actually a thing.

dkural · on Oct 20, 2013

He meant postgres, as you also know. When autocorrect fails, there is empthought.

rapind · on Oct 21, 2013

Actually postgres can be a bit of a PITA, but so can mongo. At the risk of sounding reckless, unless the app needs to support high CUD throughput I sometimes opt for sqlite. Doesn't get much easier than that and it's read performance is impressive from what I've seen.

Even then you can sometimes get away with staying on sqlite for your admin side CRUD and redis for the heavy / concurrent writing from the public facing side (obviously situational).

bkanber · on Oct 21, 2013

I've been using sqlite more and more as well. Super lightweight, but since it's SQL most ORMs can handle switching to MySQL/postgres really easily if you ever need to make the switch.

chris_wot · on Oct 21, 2013

Now this is an important point. Once you make your product, what do you need to do to retool a major part of the application? This seems like an excellent approach.

Many people in the Java world use something very simple like hsqldb, then shift to a new database when out of development.

threeseed · on Oct 20, 2013

Configuring write-locks ? I guess you have never actually used MongoDB before which explains why absolutely none of what you said makes any sense. MongoDB is far easier to use, manage the schema with and scale than PostgreSQL.

mritun · on Oct 21, 2013

... if you don't care about your data!

It's not a database that by default just throws your data at the wall and well, doesn't even care to look if it stuck.

rsynnott · on Oct 20, 2013

> Mongo's really great for rapid prototyping.

Has any phrase ever struck more fear into the heart of a programmer? Rapid prototypes have a nasty tendency of accidentally becoming products...

> You can always switch databases later

shudder

nostrademons · on Oct 21, 2013

That's why you use a language or technology that's politically unfeasible for your rapid prototypes, like Clojure or Haskell, or...for that matter...MongoDB. ;-)

chris_wot · on Oct 21, 2013

In certain companies, MongoDB is the new hotness. This could very much backfire on you!

twic · on Oct 22, 2013

Someone tried that at my company. We now have a Clojure app in production. Brilliant.

nostrademons · on Oct 22, 2013

Hey, it's a great way to get lesser-known languages into production! </slight-sarcasm>

I wonder if I'm going to be the one who finally switches Google Search over to Go, by way of a quick throwaway prototype...

EGreg · on Oct 20, 2013

I think they are upset with their marketing, like "web scale". Even the name "Mongo" is derived from "Humongous" -- but that's exactly the scale at which you'd switch away from Mongo.

Reminds me of this: http://www.youtube.com/watch?v=URJeuxI7kHo

Want NoSQL? Use RIAK!

sargun · on Oct 20, 2013

Riak also has massive problems. Realistically, figure out your data that you want to stick in a database, why, and how you're going to query it, and then work from there.

sitkack · on Oct 21, 2013

And those are what?

sargun · on Oct 25, 2013

Your entire database's keyspace must fit in memory if you're using Bitcask. If you're using LevelDB, you have compaction overheads. Also, sibling resolution can get very messy, and complicated if your app works in a way that can potentially result in sibling explosions.

rsynnott · on Oct 20, 2013

> Want NoSQL? Use RIAK!

See, you're just perpetuating The NoSQL Problem. :) Riak is well-suited to some tasks, but it is no more a magical fits-every-problem thing than MongoDB is.

EGreg · on Oct 20, 2013

I sometimes throw out these things as a quick way to get reactions and interesting feedback as to why something is good/bad. For instance right now I know mostly good things about Riak, that's why I posted this ending to my comment.

dboat · on Oct 21, 2013

Having recently rolled out riak into a production environment I can offer some off-the-cuff bullet points:

- Mostly easy to work with. Mapreduces can be a big pain to troubleshoot because you can't console.log() in your JS. Didn't try it in erlang.

- Being masterless, it has a very good replication story for servers _in the same data center_. It really bit us that there was no good riak solution for syncing data across multiple data centers. There is an enterprise solution for that, but it's quite expensive, which makes riak less appealing if you don't have much budget on your project.

- Errors in general are next to useless. Get comfortable waiting for answers in IRC when you get opaque error messages after running queries. You can definitely work past this, but it wasted a lot of my time.

- Not sure if pro or con, but as the cluster reached load capacity, from a combination of data size and read requests, map functions would begin to slowly fail. After a while, we could tell which completely useless error message (preflist_exhausted, my old friend) could be fixed by a cluster restart, and which would simply begin to happen with greater frequency as more data was added. This was exacerbated by my company refusing to pay for anything more than a three node cluster. You might say I should have fought harder for more, but I had to fight to make them not host all three nodes on a single server. There are places that will hire you that simply do not intend to do anything sane, but I digress. The takeaway: riak is not a super cheap way to scale.

- Bulk inserts? What are bulk inserts?

- Key filtering is just a shim over listing all keys in a bucket. Further, listing all keys in a bucket, or all buckets in a cluster, can be very expensive, and basically you'd never do it unless you had a very small bucket. The bag of tricks you can apply to speed up slow queries is basically "Do you have secondary indexes? Ok, good."

Those points do read a little negative, but actually I would use riak again. To me it works best as a temporary event store living in one data center. If you've got a bunch of items shuffling around your backend in real time, being processed to and fro, you could definitely do worse than sticking in it riak and adding more nodes as needed.

pharkmillups · on Oct 21, 2013

A bit off-topic, but if you've got a moment I would love to pick your brain a bit more about your Riak usage. Shoot me an email if you're up for it - mark@basho.com

JulianMorrison · on Oct 20, 2013

Have you considered RethinkDB? For most of the advantages of MongoDB that don't specifically come from mmap and overwrite-in-place, it's an equal or better.

ahoge · on Oct 21, 2013

RethinkDB still doesn't support Windows. [1]

Most developers who use Windows will just go with something which doesn't require a virtual machine. For example, MongoDB, CouchDB, OrientDB, Cassandra, and ArrangoDB work fine everywhere.

[1] https://github.com/rethinkdb/rethinkdb/issues/1100

dodyg · on Oct 22, 2013

Did you try RavenDB which works quite well in Windows?

ahoge · on Oct 24, 2013

I'm only interested in cross platform stuff. Whichever part of the stack it is, it must work on Windows, Mac, and Linux.

lebski88 · on Oct 20, 2013

Mongo is also brilliant for internal tools that need to change rapidly but you absolutely know will never require significant scale. I've been badly burnt trying to use mongo on a large dataset but it's genuinely great for getting things done quickly.

I also used it to write a service that had to go from nothing to working in a couple of days. I then spent the next two days swapping it back out again. It was surprisingly painless to go from an object store to a relational model

bcoates · on Oct 20, 2013

I just use the filesystem for that sort of thing. Everyone justifies using MongoDB because its easy and general but compared to the tool and compatibility ecosystem around files it's awkward and primitive.

functional_test · on Oct 20, 2013

The file system is great for some things, but, for example, how would you handle the same data to having multiple indices?

barrkel · on Oct 20, 2013

Links - both soft and hard.

functional_test · on Oct 21, 2013

Wouldn't that become quite a bit of book keeping though?

barrkel · on Oct 21, 2013

Maintaining an index is a cost of the lookup efficiencies gained.

danpalmer · on Oct 20, 2013

The case for using it as a prototyping database is the best use-case I've seen for Mongo, however I'm not sure it's always a good idea.

For a hack-weekend sort of project, fine, but if you are in any way attempting to make a product, it strikes me as the sort of thing that would be really difficult to change later down the line, and so worth investing the very little extra effort it takes to include your schema in the database, and use something like Postgres/MySQL/etc.

spamizbad · on Oct 20, 2013

It is really difficult to change down the line. My company tried and failed. Now we're stuck with MongoDB. Huge mistake, but a lesson was learned.

Edit: "tried and failed" in the political sense. You don't change horses in midstream etc.

gaius · on Oct 20, 2013

If this was a proprietary database we'd call that vendor lock-in and advocate an open source solution. 10gen is a company that earns it's revenue from selling support. They are highly incentivized to lure you in and trap you in a situation that requires a lot of consulting.

rsynnott · on Oct 20, 2013

In practice, for any sufficiently complex application, all databases carry a fair bit of lockin.

threeseed · on Oct 20, 2013

Or they could just be interested in adding useful features.

PostgreSQL has HSTORE which is a useful but proprietary feature. Cassandra has the ability to have Lists/Maps as data types. Again useful but proprietary.

If you are that concerned about database independence then do what everyone else does. Use an ORM, minimise coupling in your domain model and do as much as possible in the application layer.

integraton · on Oct 21, 2013

You are misusing the word "proprietary." http://en.wikipedia.org/wiki/Proprietary_software

pkolaczk · on Oct 21, 2013

Cassandra support for lists/maps is open-source. Cassandra is Apache Software Foundation project. Where is it proprietary? Or do you mean something else under "proprietary"?

bkanber · on Oct 20, 2013

I feel the opposite way. If your application's data layer is sensibly designed, it shouldn't be too bad to switch to postgres when you need to. It may be tedious if you have a large codebase, but it won't be difficult.

I think the up-front benefits of using mongo (especially as a sole developer/devops/sysadmin person) outweigh the difficulty of the changes you'll need to make later on, which will only happen as you hit scale and have more resources to nurture the devops side of the tech stack.

dboat · on Oct 21, 2013

This prototyping story sounds to me like admittedly shooting yourself in the foot if your prototype turns out to be worth a damn. Basically, you're saying mongo is an excellent choice only when your storage backend is a moot point.

I can't think of any codebases I've seen where intentionally choosing the storage backend you know you don't want to use (if the project is successful) would be a reasonable thing to do. Understate it if you must, but having to change your backend from mongo to postgres is not a desirable situation. Besides, if you're going to use postgres to scale, use it's features and write well optimized queries for it. The difference between a bad massively complex query and a well optimized one can be several orders of magnitude, and that optimization can indeed be difficult. It goes without saying that you wouldn't leave that to an ORM.

nostrademons · on Oct 21, 2013

The advantage of shooting yourself in the foot if your prototype turns out to be worth a damn is that it forces you to rewrite it with more rigorous development practices ASAP. Usually, choice of a storage engine isn't the only problem with a throwaway MVP - you've probably written it in a language that won't scale, and skimped on error handling, and are using really inefficient algorithms, and didn't bother documenting anything.

That said, I would use PostGres for my MVPs, using it as a key-value store initially until its more clear what the schema should be. That is, if I still bothered using code for prototypes; of late I've been more fond of napkins and Adobe Fireworks.

threeseed · on Oct 20, 2013

I've never understand people that worry so much about the schema. It's like they've missed the last decade of computing. Everybody these days uses ORM. Which means that (a) data migration between databases is a relatively simple task and (b) schemas often just get in your way.

nbevans · on Oct 21, 2013

The problem with that argument is that prototyping rarely needs a database. Just store stuff in memory. Who the fk in the real world is seriously writing data-layer code for a prototype?

lucian1900 · on Oct 21, 2013

None of the things you listed are any harder with Postgres and SQLAlchemy. Learning to use MongoDB isn't exactly trivial anyway, so why choose the thing that is known to be broken at all, when it's neither easier nor faster?

rgo · on Oct 20, 2013

Previous versions of my startup's enterprise product used to be based on relational DBs (mostly Oracle, MySQL also). This year we switched to Mongo and dropped RDBMS support.

RDBMS performance was fine most of the time as we're not doing big data really. Our problem was developing and maintaining a schema that holds lots of metadata many levels deep. Our app allows for unlimited user defined forms and fields, some of which may hold grids inside which hold some more fields... Our app also handles lots of logs and large file dumps, which slowly made data, cache and fulltext search management mission impossible. Even though we had considerable previous experience with Mongo, it took us a long time to switch because we were utterly scared. It's nice to sell a product that is Oracle-based, as that sent out a message about our "high-level of industry standardization and corporate commitment" bullshit that (we thought) is quite positive for a startup competing against the likes of IBM, HP, etc.

To our surprise, our customers (some Fortune 500 and the like) were VERY receptive to switch to a NoSQL, opensource database. Surprise specially given it would be supported by us instead of their dreadfully expensive and mostly useless DBA departments. It even came to a point where it has changed their perception of our product and our company as next generation, and surprisingly set us apart from our competition even further.

In short, as many people here know, not all MongoDB users are cool kids in startups that need to fend off HN front page peak traffic day in day out. Having a schemaless, easy to manage database is a step forward for sooo many use cases, from little intranet apps to log storage to some crazy homebrew queue-like thing. 10-gen superb, although criticized, "marketing effort" also helps a lot when you need to convince a customer's upper-management this is something they should trust and even invest on. I can't express my gratitude and appreciation for 10-gen's simultaneous interest in community building, flirting with corporate wigs and getting the word out to developers for every other language. Mongo is definitely a flawed product, but why should I care about the clownshoeness of its mmapped files when it has given us so much for so long?

andrewvc · on Oct 20, 2013

Well written post. Even as a detractor of mongo I'll agree that it works for your use case. But the key is "Our app allows for unlimited user defined forms and fields, some of which may hold grids". That really isn't a very common case. SQL is not great at representing large groups of documents without any common structure.

The vast majority of apps just don't deal with that problem. If MongoDB was really only used by people that its a good fit for (like yourself), it'd really be a niche product. They're marketing it as a general purpose product, which is why they've earned scorn from so many.

badclient · on Oct 20, 2013

Bingo. They should call themselves mongodocs, not mongodb. The way I see it mongodb sees widespread misunderstanding about its use cases and instead of make the use cases more clear, they seem to take an interest in seeing mongodb being used unnecessarily.

continuations · on Oct 21, 2013

> Having a schemaless, easy to manage database is a step forward for sooo many use cases

Can you explain why can't you do schemaless with an RDBMS?

From what I understand MongoDB is schemaless by storing all fields as one single JSON document. So what stops you from doing the same in an RDBMS - have a catch-all field "JSON" and store all your data there?

aaronem · on Oct 21, 2013

That gets you halfway there, but you still don't have the ability to query your datastore by structure, unless you've installed PostgreSQL 9.3 and are using its JSON field type, which does have that capability, thus entirely demolishing the NoSQL USP as far as I can determine.

edraferi · on Oct 21, 2013

That is awesome.

aaronem · on Oct 21, 2013

Also of note is that stored procedures are supported in a variety of languages, including Javascript, so it's quite easy to handle cases where the surprisingly broad range of core JSON functions and operators [1] doesn't include what you need.

PostgreSQL has also recently added a key-value store type [2] with semantics reminiscent of Redis. The impression I get is that they're gunning for the NoSQL kids in general, and this pleases me; while I grant it is sometimes possible and necessary to obtain new insight in a field by ignoring all that's gone before, I very much doubt this is one of those times, and I am therefore delighted to see a properly engineered database engine gain more or less the entirety of the features which draw interest to the NoSQL crowd in the first place.

[1] http://www.postgresql.org/docs/9.3/static/functions-json.htm... [2] http://www.postgresql.org/docs/9.3/static/hstore.html

chris_wot · on Oct 21, 2013

That is extremely interesting. So it looks like you can store a JSON type as well as a KV datatype in Postgres! And it looks like it is relatively easy to convert between the two.

This leaves only performance. I think I'm still confused around this area - why do people say that non-relational technologies like MongoDB are faster than relational databases?

parhamn · on Oct 21, 2013

That'd be an unqueryable string field in most sql databases. Not at all the same thing.

digitailor · on Oct 20, 2013

Thanks for sharing your experience on this. We make a lot of custom enterprise intranet applications, and we've been considering adding MongoDB to our toolchest. My concern has been what you say- there will be customer resistance. That they'll have fear for the future of their DB, since it's not SQL. Based on your post, it seems like it may be my fear and not necessarily the clients'. I was very interested in NoSQL at the beginning and what that did is make me realize I need to up my SQL game. I want to make sure I don't move into NoSQL just because it's hip and relational DBs annoy me.

We are evaluating Mongo to be the persistence for single page web applications, which is how we'd like to start making the majority of our intranet/private enterprise jobs. Are you using it in this context and has it been helpful? Your example of nested fields being easier made me warm and fuzzy- we had a project last year that had growing, fluid, user-defined data structures. We made it work (well) with Postgres but there were several kludges that really bothered me. One of them was handling delete dependencies gracefully on user-defined, nested structures. Did you encounter this issue pre-Mongo as well and if so, did it help with it?

oijaf888 · on Oct 20, 2013

Perhaps your customers were receptive because you were supporting it and they were happy with the existing support you got and unhappy with their internal DBA department? Its possible that they didn't care about the technology and just realized that the support was going to be better.

willvarfar · on Oct 20, 2013

Varnish famously demonstrated how to use the kernel page cache effectively. MongoDB, though, is Squid-like. Its an interesting comparison.

Every single MongoDB step has had the old timers groaning.

Even with something solid like Tokutek's storage engine in it, its going to be a hard sell.

leif · on Oct 21, 2013

I'm an engineer at Tokutek

I'm confused by your comment. The beginning acknowledges the fact that MongoDB has a weak storage engine, but your conclusion is that, even with a strong storage engine like ours, there is still a problem. What other problems do you see? Are they something we could work on?

jpgvm · on Oct 21, 2013

This is going to come off as abit negative but I kinda feel it has to be said. I would first like to say I do love the Fractal tree indexing, very cool and could have alot more intesting usecases outside of databases (I'm thinking logical volume/block storage etc.. I'm always thinking in kernel land..)

The problem is that Mongo advertised itself as a database and wasn't one. Once you do that reputation of the product is dead forever.

TokuMX is a real database as far as I can see, MVCC, great indexing story etc.

By association TokuMX is probably not regarded as highly as it should be. Which is a shame but it's a people problem, not a technical one. People can very easily lose trust in a technology at which point it's effectively dead, it might take a long time to die due to lock-in but it's dead.

For instance I have recently started playing with RethinkDB over TokuMX almost purely because of Mongo association.

Now technically that might not sound like good reasoning but when you think about the kind of person that writes a database that doesn't fsync your writes by default and relies on the page-cache over doing direct I/O when building a database.. doesn't really inspire confidence in the network stack, the query planner.. or well anything.

If anything it makes you insistent on not having ANYTHING to do with that sort of codebase.

Just replacing the storage engine might actually be good enough, but restoring my trust in the rest of the codebase is almost a forgone conclusion at this point.

leif · on Oct 21, 2013

I've seen a lot of the rest of their code, and most if it is getting better over time, as they grow they're forced to adopt better habits in order to scale their engineering team. I think you're misunderstanding the type of programmers they are. They didn't use mmap because they are sloppy everywhere, they used mmap because their critical innovation was not in storage. What they really thought was valuable, what they wanted to work on, was the query language and cluster management tools, so they did the simplest thing for storage and moved on (personally I don't understand why they didn't just use BDB, maybe they were afraid of transactions, but I suppose everyone has a little NIH syndrome in their database). Now they're a bit locked in to that code, because after bolting on journaling (that architecture is a brilliant but incredibly dirty hack), the code is a mess and I'm sure nobody wants to touch it. In fact most of the other subsystems have been getting cleaner rewrites, except for the storage layer. I think the only way out is a complete replacement, which is what we did so I feel pretty good about that. So I don't know if I'll convince you, but I've read a lot of their code (especially in the last few weeks, I've been backporting things from 2.4), and that's the feeling I get about their history and vision. Hope it gives you some insight.

jpgvm · on Oct 21, 2013

I don't think the problem is I misunderstand them, I just disagree with them

I disagree with them on what is the minimum viable product for a database. I come a storage and service provider background where failures are treated very harshly (usually death of companies for singular mistakes) so I take releasing a product that stores customer data very seriously.

To be honest this is the biggest attraction for me to RethinkDB. They waited a sufficiently long amount of time with a commercially backed team of very competent engineers that obviously have the required background to sit down and DESIGN a database. The query language generates a non-turing complete language with a clean AST the has all the right deterministic characteristics to implement a powerful planner/optimizer. Their on disk format has been abit in flux but the core design is excellent and you can see that it has been optimized for very fast range queries. Even the API protocol and serialization were designed with care, not to mention the excellent ReQL language and attention to detail when integrating drivers into the host language.

Which is the other thing I tend to dislike about Mongo, it reeks of lack of design. The journalling effort for instance as you pointed out is very adhoc, this goes for GridFS and alot of the other features they have integrated into the codebase. These are smells that I can't ignore when looking at a product that I need to trust with my data.

The counter argument is to not trust it with your data. But I am yet to find a reason where that makes sense where another datastore wouldn't be a better choice.

willvarfar · on Oct 21, 2013

As I recall it, all the early noise they generated was their excitement about their hot benchmarks and how good mmap was...

They can try and rewrite the web and remove all the silly benchmarks, but they were the loudest "web scale" cowboys back in the beginning and we remember them for it.

chris_wot · on Oct 21, 2013

This post does not inspire confidence. Sorry, has to be said.

rogerbinns · on Oct 20, 2013

There are also people using MongoDB and finding it meets their needs well, and don't feel the need to keep writing about how everything sucks or is wonderful. (I'm one of them.)

None of how MongoDB works is a secret. And just like everything else it has sweet spots and problem areas. And like many others, development continues and it gets better.

The database does not get the job done - it is a tool to help get the job done.

orthecreedence · on Oct 20, 2013

> None of how MongoDB works is a secret.

Maybe not now, but this hasn't always been the case. The fact that they had (have?) a global write lock was completely buried on the doc site for ages. Benchmarks were waved in front of developer's faces to distract them from the "drivers don't actually write data, they just blast it out in every direction and hope it lands somewhere good" BS.

I don't use Mongo anymore, and I think a lot of it has not to do with the database itself, but with the way 10gen used their marketing machine in a dishonest way. They incurred a lot of trust-debt, and now have a serious amount of work to do to pay it back.

rit · on Oct 20, 2013

I worked for 10gen (now MongoDB) for over 2 years (I left in December).

Never once while I was there did they publish a benchmark: There was a [publicly] stated company policy to not publish or comment on benchmarks.

If you have evidence otherwise (i.e. benchmarks published by the folks working on MongoDB) fine, but I take this as a deliberately inflammatory (and false) statement.

EDIT: The global write lock was removed ~last August; there is now a database level lock. Future releases will likely make that more fine grained. Additionally, the drivers no longer do "unsafe" writes, but check w/ server.. as of the same release.

justinsb · on Oct 21, 2013

It is not a sin of commission Mongo is accused of, but a sin of omission. MongoDB out of the box is configured to be fast-but-unsafe. Postgres and other databases out of the box are configured to be safe-even-if-slower. A benchmark which doesn't spend the time to configure both systems equivalently (i.e. most community benchmarks) will therefore show MongoDB as the faster system. The policy of not publishing/commenting on benchmarks simply allows misleading benchmarks to be created and to stand. It's a self-serving policy.

Mongo has repeatedly chosen defaults for their database which make naive benchmarks look better, at the expense of production safety. You seem to be willing to attribute that to Mongo's incompetence. Proverbs are on your side, but it sure ties in nicely with the "leave the benchmarks to the community" policy.

mdellabitta · on Oct 20, 2013

> database level lock

But that's not anywhere close to good enough.

lowboy · on Oct 21, 2013

> But that's not anywhere close to good enough for me

FTFY

mdellabitta · on Oct 21, 2013

> But that's not anywhere close to good enough for concurrent, multiuser systems with reasonable traffic.

FTFY

lowboy · on Oct 21, 2013

Mmhmm. Like I said, that's not good enough for you and for your needs. For other people it's fine. That's important to note.

lucian1900 · on Oct 21, 2013

At that point you might as well use a JSON file per database and lock the entire file, parse, change then serialise again. That might even be faster.

rogerb · on Oct 20, 2013

Sorry, but the global write lock has been public knowledge since the very early days. Mongo's done nothing to hide that. Also (i worked for 10gen at the time) the "marketing department" that you refer to, was a single person organizing events to give developers what they wanted: knowledge and a community around mongodb.

rsynnott · on Oct 20, 2013

I think it's really less that 10gen itself was trying to mislead people, and more that the community itself was building up a strange mythos with little relation to reality. This, unfortunately, tends to happen rather often (node.js is magic! Java is really slow! etc.)

digitailor · on Oct 20, 2013

That's what made me drop research into NoSQL a couple of years ago- the overly optimistic and magical thinking seemed to be really pervasive. I don't want to get burned from joining in a group delusion. (I'm not trying to say that's what's going on specifically in Mongo or anything else, just acknowledging that the mythos phenomenon mentioned above can repel me.) Since you've coined the term, are there any NoSQL projects you've seen that don't suffer from the mythos issue?

rsynnott · on Oct 20, 2013

It varies, a fair bit. MongoDB is perhaps the worst, possibly because it's easy to get up and running with, and behaves a little magically (you don't need to know how it works to use it, or at least so you might think initially). It also has a company pushing it, of course.

I'd say that this is less of a problem for the Dynamo paper databases (Riak, Voldemort, Cassandra), because really to use them at all you have to have some idea of what's going on.

I'm a bit biased here, though, while I've found Voldemort, Redis and Cassandra useful and can see Riak and a couple of others being useful, I could never really figure out a good reason that anyone would use MongoDB besides naiveté. That said, I've never really tried, as I don't have a problem that fits it (part of my issue is that I don't know what a problem that fits it would look like).

dodyg · on Oct 20, 2013

If you are looking for a sober and no bullshit NOSQL database, consider RethinkDB. The DB is quite young but it doesn't have some obvious flaws out of the gate.

jbooth · on Oct 20, 2013

That's all true, but they were giving 90% of their users exactly what they wanted:

"We value feature-set and expressiveness much more than scalability at our data size, but we want to feel like we're big data too so say some of that stuff"

And that's their brilliance, they listened to what people said they wanted and then gave them what they really wanted.

gaius · on Oct 20, 2013

No, what they thought they wanted. They promised a world without DBAs, but the "DBA" is really, whoever gets called at 3am in the morning with the site goes down. If you don't know who the DBA is, it's you. They promised a world without schemas, but you always have a schema, the only question is whether you know what it is and the tools help you validate it or not. They promised massive scalability, but really just pushed scalability problems out into other layers of the stack. I could go on and on...

arohner · on Oct 20, 2013

Then where did the "Mongo is web scale", and sharding and performance come from?

The fact remains that Mongo just doesn't scale, and 10gen was never honest about that.

lttlrck · on Oct 20, 2013

Define "just doesn't scale"

arohner · on Oct 20, 2013

Global write lock. Abysmal performance when your data set doesn't fit in ram. Clustering with durability. Clustering in a way that isn't horribly complex and fragile.

invisible · on Oct 20, 2013

Mongos (the routing for MongoDB when clustering) has a bunch of drawbacks that make MongoDB worse. One of which is dropping all connections when a master switches. Have you dealt with those problems yet? MongoDB is generally perfect at small scale is what I have perceived.

The new database level lock in 2.2 is also annoying (and arbitrary) but it is better than the global lock.

ak217 · on Oct 20, 2013

I'm curious, how do other DBMSs handle a master switch/other cluster updates? I'm familiar enough with mongos to know how it works but not what e.g. redis or postgres or mysql does.

gaius · on Oct 20, 2013

Oracle restarts your query, where it was interrupted, on another node. The feature is called TAF, transparent application failover. A client might notice a brief pause, but probably not even that.

pkolaczk · on Oct 21, 2013

The best is not to ever need it by using an architecture with no SPOF (even temporary). Master switch is a huge pain - there are simply too many nasty ways it can fail miserably. I'd stay away from databases needing it, if high availability is the primary concern.

ak217 · on Oct 22, 2013

OK, but that's a serious performance and functionality tradeoff.

Most modern architectures make the choice between having nodes serve as master for a subset of the data, and the increased cross-link bandwidth needs and reduced flexibility of a master-less system.

rsynnott · on Oct 20, 2013

Most master-slave databases don't do auto-promotion themselves; it's a bit of a minefield. (In particular, in cases of network partition, where some applications servers may have a different view to others on whether the master is dead or not).

ak217 · on Oct 21, 2013

OK, but that's a different concern. If a database admin manually switches a master (called a primary in Mongo), then someone is responsible for deciding what to do with queries that are in flight. In Mongo, the driver drops them, which is less than optimal. It certainly is a hard problem for writes, but not so for reads.

To your point, I think deployments on modern architectures generally want the ability to scale out and tolerate network partitions, which makes the ability to drop and re-elect masters, reconcile a node that rejoins after a partition, manage shards, avoid hosing remaining nodes, etc. critical. Inability to do so really hurts on a platform like AWS (or in any multi-datacenter deployment, really).

invisible · on Oct 21, 2013

I think dropping reads is just as unsettling as dropping writes in production.

Also, MongoDB does let you elect a primary via setting a priority. Really, it should be a requirement because sometimes mongo nodes will switch due to a dropped packet (or this is all I can assume at least) and the arbiter just randomly picks a node when there aren't priorities.

calinet6 · on Oct 20, 2013

It's important to make the inner workings of our tools visible and understandable, so that we can make the right decisions about them.

In this case, this article does nothing but elucidate the truth, quite plainly and clearly, in fact. That's important, in and of itself.

How you use it is up to you, but we must not vilify truth.

weddpros · on Oct 21, 2013

Most people here seem to complain that their champion lost a benchmark, or that Mongodb does not follow their ideology.

I've used Mongodb for the last 18 months, never lost any data, and it made it obvious that "enough durability" is sometimes... enough. When I switched to MongoDB from MySQL, performance rose 10+ times and I switched from "don't hit the db too much" to "give it more work, 'cause it's idle".

I know it's an ideology problem: otherwise, I can't see why people would complain about a tool they don't use.

yeukhon · on Oct 21, 2013

I am so happy Postgres is adding support for JSON. This is a big change. The sole benefit of mongo to me is that you can be flexible with your schema at the beginning. But the consequences are

* you have to learn to do indexing right later (if you have to scale)

* failure and miss starting to occur (as you scale)

* more code to write to manage legacy schema and optional fields

The last is painful and ugly. Whereas if you start out with a good schema that last point is in a good hand. When you use SQL you always have the restraint that "xyz" attributes are repeating and you can just make a new relation, whereas with mongo you'd stuff 20 fields into a single collection. The refactoring is harder.

I will begin to migrate back to SQL for new projects.

Also ecosystem is richer in SQL. I have not seen a good ORM for Mongo. MongoEngine is fine but implementation + db have a lot of issues make that ORM a bit unusable from time to time. SQLAlchemy is good.

PS: For quick PoC and Hackathon projects sure prototyping with mongo is fine.

warmwaffles · on Oct 21, 2013

> I have not seen a good ORM for Mongo

Uh, Mongoid is good. I have used it on past projects. It's nice. (mainly use some form of SQL now)

functional_test · on Oct 20, 2013

He's right that MongoDB could use improvements like string interning so you don't need to worry about field names. But overall, I think this article is very misleading.

If you use MongoDB in production, you should definitely take he time to learn about the durability options on the database side AND in your driver. By using them appropriately, you can have as little or as much as you like. Data sets larger than 100GB are no problem either -- right now I'm running an instance with a 1.6TB database.

As always, use the right tool for the right job. If you need joins/etc. and don't need unstructured data, Mongo probably isn't a great choice (even with the aggregation framework).

yummyfajitas · on Oct 20, 2013

For what use cases is Mongo the right tool?

functional_test · on Oct 20, 2013

I use it to store a lot of historical time series data that doesn't change once written (at least, not often). I can easily achieve the write performance necessary to record the data streams live. Since it's all append-only, I don't need to worry about fragmentation. With replication, it's possible to access the data with very high throughput which is useful when the data is being accessed by a cluster, for example.

I also use it as a metadata "scratch space" for highly available applications (things where failures are not acceptable and must run for days at a time). Again, with replication and automatic fail overs, I've been able to maintain 100% uptime outside of maintenance windows. Obviously that can't last, but so far it's been >2 years with no major problems.

EDIT: I should point out that although the size of the metadata objects can be highly variable, since I usually had a small number of them relative to the time series, fragmentation was still not an issue.

JulianMorrison · on Oct 20, 2013

You have a smallish number of documents where some particular field of fixed size gets overwritten a lot, the old values are uninteresting, and it wouldn't really be a tragedy if your data got trashed. For example, it's the player's score.

You want a fixed-size, rolling backlog of time series data such as logs.

twic · on Oct 20, 2013

Is it better than a relational database for that?

yummyfajitas · on Oct 20, 2013

Postgres update performance is pretty bad. When running a big data migration, it's generally faster to copy the old table to a new temporary table and rename the temp table to the old table than it is to run an update.

ams6110 · on Oct 20, 2013

This is the case with all RDMBS I've used.

pkolaczk · on Oct 21, 2013

RDBMSes need to check for primary key violations, hence read before write. Random access is slow. The fastest you could do is "no read-before-write, append-only writes, compact later" (Cassandra way).

JulianMorrison · on Oct 20, 2013

Most RDBMSs will, if you rewrite a field, write a fresh row and tombstone the old one, and clear it down in the next compaction. This is what MVCC means in practise: that the old version doesn't disappear while the new is being written.

MongoDB by contrast will simply mmap that block of file, overwrite the contents, and fsync. Yes, this has obvious downsides.

twic · on Oct 22, 2013

I won't dispute the empirical finding that MongoDB and friends are faster at this than RDBMSs.

However, i'm not sure why this should be the case. You mention the complexity of updating a row in MVCC; sure, but all the database has to do before reporting success to the user is to write its intent to make this change to the transaction log (WAL in PostgreSQL, redo log in Oracle). The actual changes to the data files can be written back later on. The transaction log is a single stream being continuously written to disk, so that should be very fast.

MongoDB, on the other hand, is making scattered writes across its mmapped data files, which should be much slower. Except that of course it's probably doing this on a journalled filesystem, which is using exactly the same mechanism as the RDBMSs to provide fast, safe updates.

I'd be really interested to see how a simple update to a single field translates into actual writes to disk for PostgreSQL and MongoDB. If only i knew how to use strace!

brandonbloom · on Oct 20, 2013

Isn't a file system a better solution for both of those use cases?

JulianMorrison · on Oct 20, 2013

It might well be. MongoDB is basically ORM for "mmap a file".

chris_wot · on Oct 21, 2013

So it's useful for a very limited set of use cases, and you must never be concerned that your data will be lost. I can't believe I'm reading this!

mattdeboard · on Oct 20, 2013

It's "a" right tool in any case where distributed storage of unstructured data in JSON format is wanted, where database-level locks won't be an issue of concern, and availability is the primary, overriding concern.

yummyfajitas · on Oct 20, 2013

What does mongo offer over (possibly sharded) postgres for this use case? Postgres won't hit you with db-level locks and gives you master/slave replication for availability. You can also get great performance if you put the WAL on a ramdisk, which I think is roughly equivalent to how mongodb handles writes.

I'm really not trying to be argumentative here, I'm just trying to understand what mongodb is for.

defen · on Oct 20, 2013

IIRC, until fairly recently (well after Mongo had launched), "master/slave replication for availability" in Postgres was a bitch to set up, requiring 3rd-party tools + manual failover if the master died. It was a lot easier to get going with Mongo, which is really what matters if you're a 2 person startup just trying to validate an idea.

mattdeboard · on Oct 20, 2013

Strongly disagree here. MongoDB (10gen, at the time) had absolutely insane, irresponsible defaults set in all its drivers until, like, 1.8 (very recent). This is anathema to "we're just trying to validate an idea" especially when "our idea took off and now 6 months later we actually do need to scale."

They've fixed it like I said but that whole "we're just using it to validate an idea" thing is a total con. "Nothing so permanent like a temporary [solution]."

mattdeboard · on Oct 20, 2013

> What does mongo offer over (possibly sharded) postgres for this use case?

Doesn't really matter for the point I'm making. It's a solution for a given set of constraints. Not the solution, or the very best tippy-top solution in all the kingdom, just a solution.

Point being I can't think of a use case where this is true, but if you read the article, the author does include what he says is the only reasonable use case for using MongoDB.

jbooth · on Oct 20, 2013

I'd submit that database-level locks make any claims of availability or distributed storage a little overblown. If a single query can blow you out of the water, you're really not highly available. Although I don't have a lot of experience doing big mongodb personally, so maybe I'm missing something.

sunils34 · on Oct 20, 2013

That's exactly right imo. Running MongoDB in production, you end up concerned over the performance of each query (as you should be). MongoDB's profiler makes this easier to investigate.

If you hit a db level lock limit, you're probably running a sub-optimal or unindexed query.

mattdeboard · on Oct 20, 2013

That doesn't have anything to do with availability, at least not in the CAP theorem sense as I understand it. What I think you're talking about (being "blown out of the water" is pretty vague, though) is partition tolerance: high-latency requests that are practically indistinguishable from network partition events.

I'm not sure what MongoDB returns (or how its clients react) when there are no available connections because of a lock whose duration exceeds the configured timeout. I'm pretty confident, though, that this sort of thing is covered by basic driver config.

jbooth · on Oct 20, 2013

From what I've seen, the data model has a lot of utility as long as you don't need super high concurrent performance. Basically, the same area as where rails is the right tool - we want easy features and rapid development, will worry about scaling later.

yummyfajitas · on Oct 20, 2013

What does mongodb offer above and beyond using postgres or redis for this use case?

bkanber · on Oct 20, 2013

Mongo's really great for rapid prototyping. You don't need to worry about updating the schema at the db level, it can store any type of document in any collection without complaining, it's really easy to install and configure, the query language is simple and only takes a couple of minutes to learn, it's pretty fast in most use cases, it's pretty safe in most use cases, and it's easy to create a replica set once your prototype gets usage and starts scaling.

Mongo does everything well up until you reach the level where you need heavy-hitting, at-scale, mission-critical performance and reliability. Most projects out there (99 in 100?) will never reach the level of scale that requires better tools than mongo. And since the rest of it is so easy to use, it makes mongo a great starting point. You can always switch databases later, but mongo gives you the flexibility to concentrate on more important things in the early stages of a project.

integraton · on Oct 20, 2013

> You don't need to worry about updating the schema at the db level

What's your magic non-db level, supposedly-easier-than-updating-a-schema approach to renaming a field common to all existing documents in a collection, eg, rename an "author" field to "writer"?

dodyg · on Oct 21, 2013

This is where a nice query/manipulation language come handy http://www.rethinkdb.com/api/#js

integraton · on Oct 21, 2013

Agreed.

PostgreSQL:

    ALTER TABLE posts RENAME COLUMN author TO writer;

MongoDB:

    db.posts.update({}, {$rename:{"author":"writer"}}, false, true);

(I'm excluding RethinkDB since it's still under development and doesn't have a rename command yet)

orthecreedence · on Oct 21, 2013

I don't think it really needs a `rename` command. The query language supports it without:

    r.table('posts').replace(function(item) { return item.without('author').merge({writer: item('name')}); })

coffeemug · on Oct 21, 2013

That works, but it is a bit annoying to type. We'll add a porcelain command for this soon. See https://github.com/rethinkdb/rethinkdb/issues/881.

MichaelGG · on Oct 20, 2013

What does it offer over using Postgres (or others) with a JSON column? Or MS SQL with an XML column?

bkanber · on Oct 20, 2013

Ease of use.

jbooth · on Oct 20, 2013

I've barely used it, but the json document thing with a lot of random convenience functions in the query language seem to lend themselves well to rapid development.

For postgres you'd be mapping to a relational schema, and for redis you'd be storing the json yourself as a blob, without any server-side manipulation capabilities (or using redis maps/sets/etc, which are awesome, but aren't as general as json).

I haven't been doing very much web dev the last few years though so it's possible that my first impressions are wrong. I'm just repeating what I've been told, basically.

Sanddancer · on Oct 21, 2013

Postgres has had a native json type and the latest version improves upon the functions given to manipulate json data. So data that may not map well to a relational schema can just be put in the json object type and handled accordingly. Also, because it's attached to an SQL engine, you can use things like views on your json data if it makes sense for the type of data you're querying.

There has been a considerable amount of work put into postgres over the past few years for getting it to handle your data regardless of what it looks like. The developers seem to have a very good grasp on the fact that not all data is alike, and giving tools that will work well, and together with, all your data leads to a lot fewer headaches in the long run.

mattdeboard · on Oct 20, 2013

Absolutely nothing.

mattdeboard · on Oct 20, 2013

Really like this article. I try not to dump on MongoDB too much because frankly I have never taken the time to understand its internals. I constrain my criticisms to particular unnecessary failures/inadequacies that I personally have experienced (or any "I'll just use mongodb so I don't have to worry about my data" sentiment).

Funny punchline at the end there too.

calinet6 · on Oct 20, 2013

I like this too. There's very little to criticize, since it basically tells it exactly like it is without too much embellishment, and makes intelligent, honest conclusions.

anatari · on Oct 20, 2013

Article is spot on about mongodb being ideal for online games. We use it as the main datastore for our latest game, and it has worked out very well for us. My main gripes with it has been key values taking up too much space and how difficult it is to shard. I think Rethink DB will be even better once that matures.

meowface · on Oct 20, 2013

RethinkDB looks like a much better database than MongoDB.

Unfortunately though, I believe Mongo is still beating it at performance, which is the one thing keeping me away.

dodyg · on Oct 21, 2013

Read performance or write?

bithive123 · on Oct 20, 2013

This article doesn't really make a case for "genius" -- "saving grace", maybe. And in what universe are the Redis data structures "crazy"?

mattdeboard · on Oct 20, 2013

I think the genius referred to is in its simplicity. This simplicity let MongoDB get a product to market very quickly, as well as the inherent goodness of making simple things.

I think in MongoDB's case, the getting-to-market part pushed a little too hard on the make-it-simply part. Simple is good but a thing should be as simple as possible, no less.

rsynnott · on Oct 20, 2013

Its apparent simplicity to _developers_ was certainly a good marketing tool, but it did come with tradeoffs; some of its limitations imply a considerable amount of developer and operational complexity if you actually want to use it.

threeseed · on Oct 20, 2013

It wasn't a good marketing tool. It was the truth.

MongoDB IS by every measure a very simple and easy to use database.

kermatt · on Oct 20, 2013

I read it as "crazy useful".

fiatmoney · on Oct 20, 2013

Redis internal data structures are quite sophisticated.

nbevans · on Oct 21, 2013

If you consider sets and trees from CompSci 101 to be "quite sophisticated".

bhahn · on Oct 20, 2013

How does being sophisticated imply crazy?

dcre · on Oct 20, 2013

It was a figure of speech; he just meant "non-standard."

meowface · on Oct 20, 2013

Most databases don't give you the option of storing an array, a hash, a set, or a sorted set. Postgres does, but it's kind of hacky in my opinion.

lucian1900 · on Oct 21, 2013

SQL databases usually give you all of those (and more) in the form of indexes. Merely a different way of looking at the problem.

mistercow · on Oct 20, 2013

>But in that case, it also wouldn’t be crazy to pull a Viaweb and store it on the file system

I've done this before when I was doing work for a client using an existing simple web host with no built-in options for databases. It works well, and the nice part is that there's a simple, obvious way to do any query. The bad part is that anything other than a primary key lookup is slow unless you add a lot of complexity.

programminggeek · on Oct 20, 2013

I've used MongoDB for various projects and found it nice to use. Lately though, I've found MySQL to be pretty enjoyable too, so honestly, what's all the fuss? It's a database.

Nobody writes about the filesystem like they do the database, and yet they do the same job - store and retrieve data.

brandonbloom · on Oct 20, 2013

The fuss is that a "proper" database does so much more than store and retrieve data.

If you've only ever used Mongo, a filesystem, and/or mysql, then you've never really used a database. Postgres (and mssql, oracle, etc) are so much richer; they are so much more than storage systems. I'm not saying you no one should ever use Mongo or MySql, I'm just saying that they are generally far inferior choices for problems bigger than mere storage.

It's high time we started thinking about storage systems and the higher-level functionality of databases separately. We can make different, more informed, and generally better tradeoffs than we're currently making by viewing this broad category of software through such a foggy lens. For instance: Take a look at how Datomic utilize pluggable storage to provide a sensible information model, with raw index access and powerful, pluggable querying.

rsynnott · on Oct 20, 2013

> so honestly, what's all the fuss? It's a database.

Different types of databases are useful for different things.

> Nobody writes about the filesystem like they do the database

You must have missed the last decade of people going on about ZFS.

programminggeek · on Oct 21, 2013

> You must have missed the last decade of people going on about ZFS.

I guess I did, what's the big deal about ZFS?

nbevans · on Oct 21, 2013

I would put it out there by suggesting that you probably have low standards when it comes to databases. Have you used anything else other than MongoDB and MySQL? These are basically two of the worst NoSQL and SQL implementations in existence.

Recommendation: Stop following all the hype and what all the other blind sheep are doing.

Regarding filesystems... yes they do. Tons of information out there about filesystems you just have to look for it. Read up about ZFS, ReFS and that should get you started.

PhilipA · on Oct 20, 2013

It depends on your needs, neither MySQL or MongoDB is the silver bullet.

lafar6502 · on Oct 20, 2013

Maybe MongoDB is clown's shoes, but so are 99% of all technology startups. They all fail before reaching limits of the database.