Twitters analytics are either lossy or eventually consistent [1]. I'm sure they'...

burgreblast · on Jan 20, 2017

Ha. Love the mail server idea.

It simply proves the point that it's not a terribly large problem that takes 10,000 times the equipment because of [search | many recipients | tags | etc].

It reminds me of that flickr architecture from back in the day: hopelessly complicated with tiers and shards and tiers and caching and tiers and tiers...to serve some images. But tagging!

Do people feel more important if they make a complicated solution? Where is Alan Kay?

polskibus · on Jan 20, 2017

Could you elaborate a bit on the message reflectors and using follower trees instead of lists with regard to messaging like Twitter? I am genuinely interested in improving messaging patterns in twitter-like scenarios (ie. large fan-outs)

vidarh · on Jan 22, 2017

Let me start at the beginning: I have used mail servers as messaging middleware. Back around 2000 I ran an e-mail provider, and we jokingly started talking about taking our heavily customized qmail install and turning it into a queuing system for various backend services we were building. Then we decided to try it, and it worked great (we ended up using it in a reference registrar platform we built when we build the .name registry; but I've used a similar solution elsewhere since)

The point is e-mail provides the federation, and has a rich eco-system of applications and handles things that are easy to mess up, like reliable queueuing and retries, as well as a rich systems of aliasing and forwarding.

So let's consider Twitter: You have a list of followers, and a list of people you follow. It provides two obvious ways of knitting together a timeline: Push and pull. In real life it's probably most efficient to mix, but for the "twitter by e-mail" architecture, let's consider push only.

In its simplest form you map twitter ids to an internal but federated "email address" to a virtual bucket. Then you use MX records to map virtual buckets to a server. On each server you map the internal email address to a mailbox.

You also maps twitter ids to an internal "email address" for reflecting tweets to that twitter accounts followers. It also maps to a virtual bucket, with MX recors mapping to a server. But instead of mapping this addres to a mailbox, you map it to a mailing-list processor.

When user A follows user B, in this model that means user A subscribes to user B's reflector.

To handle fanout, you can use the aliasing supported by pretty much all mail servers to remap the reflector address to a second mailing list. This second mailing list is a list of lists. Here you need "non-email" logic to manage the mailing lists on the backend.

To outline this, for user A, the above might look like this:

- Twitter handle A maps to A@virtual-bucket-56.timeline.local ("56" is arbitrarily chosen - imagine hashing the twitter handle with a suitable hash)

- MX record mapping virtual-bucket-56.timeline.local to host-215.timeline.local ("215" is also just arbitrarily chosen in this example).

- On host-215.timeline.local there is a IMAP mailbox for tweets from people this user follows.

- Twitter handle A also maps to A@virtual-bucket-56.reflectors.local, with MX record mapping that to host-561.reflector.local (the point being that the MX records can be used to remap failing hosts etc)

- On host-561.reflector.local "A" maps to a mailing-list package that accepts basic subscribe ("follow") and unsubscribe ("unfollow") options.

Here you already have the basics. The "magic" would happen once the mailing list A@host-561.reflector.local reaches some threshold, say 10k. At this point you'll want to add a level of indirection, say you rename A@host-561.reflector.local to A-sub1@host-561.reflector.local and creates a new A@host-561.reflector.local with one subscriber: A-sub1@host-561.reflector.local. Then you create a new mailing list on a different server with sufficient capacity, lets say A-sub@host-567.reflector.local, and subscribe that (you might want to indirect these two via virtual buckets) to the main list.

There's no magic here - mailing out a list of 10k is trivial. A two level tre with 10k at each level can have 10k leaf nodes with 10k users each, for 100m users.

In practice you'd likely "cheat" and mark the top users someone is following, and do pulls against cache servers for their tweets instead of pushing them, and so drastically reducing the need for big fanouts. Basically you need to spend lots of time testing to determine the right cutoffs for pull (which potentially will hit many servers on each page reoad) and push (which hits many servers each time someone tweets to a large follower list).

Again, let me reiterate that while this type of setup works (have tested it for milllions of messages), it's by no means the most efficient way of handling it. The e-mail concept here is more of a way of illustrating that it's a "solved problem" and "just" an issue of optimization.

For starters, you'll want to consider if it's easy enough to reconstruct data to drop syncing to disk, using RAM-disk to speed up "deliveries" etc., and you may want to consider different types of storage backends etc. You may also want to consider other "tricks" like locating leaf-reflector nodes on the servers where the accounts the reflect to are located (at the cost of more complicated "mailing list" management).

The most worthwhile lesson is that if you hash the id to a virtual bucket, and have a directory providing mapping from virtual bucket to actual server, you gain flexibility of easily migrating users etc.. If you in addition provide a means of reflecting messages to a set of subscribers you have pub-sub capability. If you need to handle big fanout, you'll want a way of "pushing down" the list and inserting a fan-out reflector "above" it.

Those patterns can be applied whether you use e-mail, or zeromq or any low level messaging fabric for the actual messaging delivery (in general the [entity] => [virtual bucket] => [server] indirection is a worthwhile pattern for almost anything where you may need largescale sharding)