Event Buffering

tim_h · on March 16, 2012

Basically, in a client-server relationship, use persistent buffering on the client-side so that the client can tolerate server downtime.

I like the simplicity of this approach. It's best to keep the low-level stuff as simple as possible when building distributed systems. It will get complicated soon enough at the higher levels.

_3u10 · on March 16, 2012

I wish people would stop using SQL as a message queue, message queuing is not what SQL was designed for.

Even if you used the shittiest message queue imaginable (email) you get this 'buffering' functionality for free.

aphyr · on March 16, 2012

The point, I think, is that the event buffer obeys the same transactional semantics as the data the event refers to. If you are not able to thread all state atomically through the message system, this can offer improved consistency guarantees.

andma · on March 16, 2012

You hit the nail on the head. If you don't need strong consistency guarantees any sort of message queue will do. That is a solved problem.

There are two key quotes from the article:

1. "... if our queue begins dropping messages, we run the risk of silently corrupting data"

The two statements that they are executing "app.update" (modifying the database) and "enqueue" (sending to message queue") are not atomic. They can make it atomic using something like 2 phase commit but implementing that will be more trouble than it's worth. Additionally, there will be performance (latency) implications since it is a synchronous operation.

2. "Notice how both of our writes are inside of a local database transaction"

Now they have atomicity very easily!

I agree with the grandparent that using SQL as a message queue is generally a bad idea. However the pitfalls are well known and can be engineered around. See this article: http://www.engineyard.com/blog/2011/5-subtle-ways-youre-usin...

I am actually working on something very similar to this at my day job. The "buffer processor" that the article is describing, we implemented using a scala process. Perhaps what's most interesting about the entire system as mentioned at the end of the article is maintaining the large amount of state for durability (we use hbase) and deliverability (we wrote a custom scala server).

_3u10 · on March 16, 2012

I don't think it really does. Technically it can but in the real world when people turn to ACID semantics to make an async system appear sync they screw it up.

Very few people set their consistency level to serializable which means that they probably have a whole tonne of race conditions that they don't think they have in their code. Even with the consistency level set to serializable you generally still end up with lots of strange state bugs.

Most of the point of messaging systems are to create asynchronous systems, the point of ACID semantics is to make an async system appear synchronous. If the issue is purely one of making a couple database writes then you probably don't need the message queue in the first place, if the couple of database writes depends on an external service then you've lost all the consistency you thought you had. (Unless the external service supports transaction semantics and you've got a 2PC system setup)

skMed · on March 16, 2012

This sort of event buffer may also be achieved by leveraging Change Data Capture (CDC) features if they are supported by your data store. For example, when CDC is enabled in SQL Server, an agent service will periodically examine the transaction log and move data changes to transient "buffer tables" automatically. Since this is happening as a background service, there is no need for a developer to explicitly perform a write operation to these buffer tables.

ryandotsmith · on March 16, 2012

Interesting. Given that I primarily use PostgreSQL, I have not come across this feature in my day-to-day. However, I can imagine building such a thing in PostgreSQL. Perhaps with a system of triggers and dblink [1] one can create a mechanism to achieve a similar result.

1. http://www.postgresql.org/docs/9.1/static/dblink.html

skMed · on March 16, 2012

CDC is definitely a more enterprise-y feature (Oracle, SQL Server, etc.). I have had to build the same sort of tracking via triggers in the past, so I would definitely agree with you. I am not super familiar with the PostgreSQL road map, but maybe it is a long-term goal.