The latency is a complex topic. In order to be able to run so much stuff on Mong...

throwdbaaway · on Aug 25, 2021

I thought you are doing millions qps with a 3 nodes mongodb cluster, from the top level comment. That would be impressive.

By batching 1-10 thousands records at a time, your use case is very different from discord, which needs to deliver individual messages as fast as possible.

lmilcin · on Aug 25, 2021

Data doesn't come or leave batched. This is just internal mechanism.

Think in term of Discord, their database probably already queues and batches writes. Or maybe they could decide to fetch details of multiple users with a single query by noticing there are 10k concurrent asks for user details. So why have 10k queries when you could have 10 queries for 1k user objects?

If you complain that my process is different because I refuse to run it inefficiently when I can spot an occasion to optimize then yes, it is different.

throwdbaaway · on Aug 25, 2021

Of course, cassandra/mongodb/etc can perform their own batching when writing to the commit log, and can also benefit from write combining by not flushing out the dirty data immediately. That's besides the point.

Your use case allows you to perform batching for writes at the *application layer*, while discord's use case doesn't.

kerng · on Aug 25, 2021

Why couldn't others with lots of traffic use a similar approach? I assume they do. Seems pretty genius idea to batch things like that, especially when qps is very high batching (maybe waiting for a few ms to fill a batch) makes a lot of sense.

lmilcin · on Aug 25, 2021

I don't see why discord's case can't use same tricks. If they have a lot of stuff happening at the same time and their application is relatively simple (from the point of view of number of different types of operation it performs) at any point in time it is bound to have many cases of the same operation being performed.

Then it is just a case of structuring your application properly.

Most applications are immediately broken, by design, by having a thread dedicated to the request/response pair. It then becomes difficult to have parts of that processing from different threads be selected and processed together to take benefit of amortizing costs.

The alternative I am using is funneling all requests into a single pipeline and having that pipeline split into stages distributed over CPU cores. So it comes in (by way of Kafka or REST call, etc.), it is queued, it goes to CPU core #1, gets some processing there, then moves to CPU core #2, gets some other processing there, gets published to CPU core #3 and so on.

Now, each of these components can work on huge number of tasks at the same time. For example when the step is to enrich the data, it might be necessary to shoot a message to another REST service and wait for response. During that time the component picks up other items to do the same.

As you see, this architecture practically begs to use batching and amortize costs.

ricardobeat · on Aug 25, 2021

What you're describing sounds like vanilla async concurrency. I seriously doubt 'most applications' use the one-thread-per-request model at this point in time, most major frameworks are async now. And it's not a silver bullet either, plenty of articles on how single-thread is sometimes a better fit for extremely high-performance apps.

After reading all of you responses, I still don't see how you think your learnings apply to Discord. They would not be able to fit the indexes in memory on MongoDB. They can't batch reads or writes at the application server level (the latency cost for messaging is not acceptable). Millions of queries happen every second, not one-off analytical workloads. It seems these two systems are far enough apart that really there is no meaningful comparison to be made here.

fao_ · on Aug 25, 2021

I'm not sure why you're being downvoted when you're a domain expert talking about your craft? People have weird hangups on hacker news, it seems

Aeolun · on Aug 25, 2021

Something about the tone of the messages rubs me entirely the wrong way.

bottled_poe · on Aug 25, 2021

It reads like justified opinions from experience. Not seeing much emotional tone in there.

combyn8tor · on Aug 25, 2021

Well on one hand you've got engineers at a billion dollar company explaining how they've solved a problem. On the other hand you've got some random commentor on HN over-simplifying a complex engineering solution.

BigJono · on Aug 25, 2021

Sounds to me like it's some "random commentor" who has solved a similar problem at a similar scale with a solution that's much simpler.

Aeolun · on Aug 25, 2021

I dunno, sounds very ‘I am very smart’ to me. They may be right or they may not, but both solutions sound workable to me.

Don’t let perfect be the enemy of good, and all that. There’s enough utter garbage around to shit on.

fao_ · on Aug 25, 2021

I think you're reading into it. They are stating that the solution in the post was overengineered, and describing an alternate solution that doesn't require as much abstraction or resources, but is manageable for data with a much higher dimensional structure

The fact that you read that as "I am very smart" and that that was a reason to downvote the post, tells more about you than it does the person you're supposedly describing.

polishdude20 · on Aug 25, 2021

This just gives me Factorio flashbacks.