This sounds like a report of a bug, but I believe this is not the actual story. ...

lomnakkus · on June 13, 2014

> This sounds like a report of a bug, but I believe this is not the actual story. It is more a report of a design tradeoff: the authors of those CP systems completely understand what happens, but were not happy to pay this performance price for reads

If the authors were aware of these issues then the documentation was dangerously misleading[1] and they should be docked points for that.

[1] As reported by aphyr, haven't read through it all myself. I'm thinking primarily of the labeling of "read from leader without going through log" as "consistent" bit.

antirez · on June 13, 2014

That's why I think this is a design decisions in both cases:

In one of this products (etcd if I remember correctly) there was a clear statement in the documentation about this semantics, and anyway, who implements Raft knows that for reads to be consistent they need to go the same path as writes. In the Raft paper you can find a whole section about this.

If you check the paper there are the following clearly stated informations:

Leaders can't reply to read queries without doing additional checks otherwise the reads are not linearizable.

For the reads to be linearizable, the following two things must be performed by leaders.

1) Commit a NOP at the start of its term, which is not a problem from a performance point of view. The problem is "2".

2) A leader needs to check if it is still the leader before every read, and this requires to contact a majority. That's the performance problem of linearizable reads, because you need to pay a latency equal to the latency of the slowest reply of the N/2+1 acks you need.

However note that even linearizable reads don't require fsync() to be called, so they are still better than writes.

brunov · on June 12, 2014

What's your opinion on exposing the option of stale vs. consistent read in the API? I can see cases where I'd be ok with a stale read while for others I'd like the most up-to-date value.

antirez · on June 12, 2014

That makes a lot of sense, there are definitely use cases where to read a past value is viable, especially considering the big difference in performances between the two kind of reads.