There's at least one good reason for Dynamo's write-to-all and read-from-all mechanism: latency.
What you've called 'W=2' in Couchbase is "write to master and at least one slave." Dynamo-style 'W=2' means "write to any two replicas." This can decrease tail latencies since you don't have to wait for the master--any two will do; similarly for 'R=2'. Indeed, Dynamo 'W=2, R=2' will incur more read load than master-based reads (at least double, but not necessarily triple, in your figures). So I think it's more accurately a trade-off between latency and server load.
Anyway, I'm pretty sure CASSANDRA-4705 (https://issues.apache.org/jira/browse/CASSANDRA-4705), which allows for Dean-style redundant requests, both decreases the read load (at least from the factor of N in your post) and should still reduce tail latency without compromising on semantics.
I don't have skin in this game, but I'm pretty sure that the Dynamo engineers had a good idea of what they were doing. (That said, the regular [non-linearizable] semantics for R+W>N are sort of annoying compared to a master-slave system, but can be fixed with write-backs.)
Good point. But "writes" are very fast, in our tests write latency is less than half read latency, so we can easily do master to slave replication within the SLA. But you point is correct, a Dynamo system is faster to achieve the same replication factor.
I think it's worth elaborating that the primary advantage to the dynamo model is not in the best- or average- case, but when everything does not go as planned -- when the master gets behind, when ec2 network latency spikes, etc. Then "any two replicas" instead of "master plus one more" is much more robust.
What you've called 'W=2' in Couchbase is "write to master and at least one slave." Dynamo-style 'W=2' means "write to any two replicas." This can decrease tail latencies since you don't have to wait for the master--any two will do; similarly for 'R=2'. Indeed, Dynamo 'W=2, R=2' will incur more read load than master-based reads (at least double, but not necessarily triple, in your figures). So I think it's more accurately a trade-off between latency and server load.
There can be big benefits to this redundant work. For example: http://www.bailis.org/blog/doing-redundant-work-to-speed-up-...
But don't take it from me: http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scal...
Anyway, I'm pretty sure CASSANDRA-4705 (https://issues.apache.org/jira/browse/CASSANDRA-4705), which allows for Dean-style redundant requests, both decreases the read load (at least from the factor of N in your post) and should still reduce tail latency without compromising on semantics.
I don't have skin in this game, but I'm pretty sure that the Dynamo engineers had a good idea of what they were doing. (That said, the regular [non-linearizable] semantics for R+W>N are sort of annoying compared to a master-slave system, but can be fixed with write-backs.)