More

andras_gerlits · on April 27, 2024

"I stand by spreading hearsay, even if a court happens to disagree"

andras_gerlits · on March 13, 2024

It's clear for everyone that if you define Consistency via Linearizability, CAP-like problems will apply, as you're necessarily creating original information on a potentially remote node. That's not the issue.

The issue is that people in practice almost never use Linearizability for their Consistency, for example I don't know of a single SQL-implementation that does full linearizability (or Strict Serializable, same difference). So the industry already means something totally different from CAP's definition, and for those consistency-levels CAP doesn't apply. In fact, I present a technical series of arguments here how you can overcome these limits in practice: https://medium.com/p/5e397cb12e63

There's a specific section about CAP in the end, but I talk about node-loss. replication, strong consistency and all the others also.

andras_gerlits · on March 12, 2024

Almost there Dustin, thank you. This article is about the mental model behind scaling consistency across arbitrary geographical distances and how this model allows us to communicate time-information the same way we now communicate data. It's an intro to the science-paper, not the implementation.

With regards to the implementation (which is omniledger.io): We basically make SQL scale via Kafka by piggybacking on the semantics of both technologies.

There's no central authority for any of the tables. The version-ledger is totally schema-agnostic, only the clients understand it. In fact, it federates schemas the same way it does records. Tables are not topics either, in fact, we scale by importing namespaces via a command-line interface, which specifies a schema. Any database can register to this namespace after which they become as much of the "master" to the namespace as any other. There's no hierarchy between them, the ledger does everything for them. Since the ledger itself is deterministically replicated, a separate instance of it can be co-located with each instance that runs the JDBC-connections to the database, raising availability of it to the availability of the whole system. Each component in the setup scales with the number of partitions created for it.

andras_gerlits · on March 12, 2024

PACELC is an extension of CAP, so it suffers from the same problem of trying to apply a universal clock to the whole system. If you do that, you will suffer these limitations. With client-centric consistency, you can work around these problems and global order "unfolds" in a "just in time" manner.

The consistency-levels can go all the way to SNAPSHOT.

tsimionescu · on March 13, 2024

Ultimately the problem we care about is very simple, and there is no way to solve it, despite your claims.

Say we have a database of customers, with two replicas - one in the USA, one in Europe. Say a customer in Europe wants to update their shipping address. We ship products every month to this customer from the USA to their current shipping address, on the 10th of that month.

The customer is updating their address on the 9th at 10 AM PST. However, we are in the midst of a massive network partition that started on the 9th at 02 AM PST and is expected to last until the 11th.

Do we perform the update in the European replica and tell the customer it succeeded (giving up consistency)? If we do, the US side of the business will ship the products to the old address, even though the update happened a full day earlier.

Alternatively, do we tell the customer the update failed (giving up availability)? If we do, then the customer can't even let the European side of the business know of their new address.

This is the CAP theorem in a nutshell, and it is obviously inescapable. It doesn't require appeals to immediacy that are anywhere close to the bounds of relativity. And while 3-day long network partitions are quite rare, partitions that last for many hours are not.

andras_gerlits · on March 13, 2024

Any modification to existing data must be "haggled for" somewhere, you're right about that. When you say "partition event", what is being partitioned here? A specific communication-link between two nodes. It's entirely possible (no, extremely likely) that your EU node would have access to a different US node, but not the one that's having the partition-event right now.

That's exactly the point of this section in the essay: https://medium.com/p/5e397cb12e63#7df1

Networks will have latency-spikes, but if you can stream time-information the same way you can stream others, you can use redundancies to mitigate the disruption of any single channel.

mrkeen · on March 13, 2024

> It's entirely possible (no, extremely likely) that your EU node would have access to a different US node

CAP theorem beaten by declaring P to be unlikely.

andras_gerlits · on March 13, 2024

Well no, by making the probability of latency-events extremely unlikely by establishing redundant channels.

https://medium.com/p/5e397cb12e63#7df1

Considering that this is maybe the 12th time I'm linking the explanation, I now think you're not looking for a discussion, you're here for a fight. Please let me know if you find any problems with the article in its reasoning. It was vetted by a lot of people before, so I really need to notify them too if you do.

sausagefeet · on March 13, 2024

> So how can we fix this problem? The same way we solve all our failure issues in IT: redundancy.

Redundancy doesn't fix any problems, it just makes them less likely to occur. Again this not only does nothing to address the CAP theorem but is done for a wide array of problems already, there is nothing new here.

andras_gerlits · on March 13, 2024

Okay, so we can move beyond CAP. So we don't talk about implementation in either the science-paper or in the intro for it (which is what this thread is about). I mostly write about implementation, Mark mostly writes about the science. So yes, redundancy will only make latency-spikes less likely. Notice however, that the way we establish strong consistency is based on communication also. There's a part in the essay I keep linking where I talk about what happens to nodes with flaky connections, it's called "Stability.

https://medium.com/p/5e397cb12e63#373c

There are three separate mitigation-solutions that go into how total-order strong consistency can keep marching on even if a specific child-group is intermittently isolated.

The first one is redundancy by deterministic replication, so there will always be many replicas which aren't just shards, but full copies of the _consistency ledger_. It's not a database, not a cache, just the thing that establishes the consistency between nodes. These instances all "race each other" to provide the first value of the outputs to other nodes.

The second one is the latency-mitigation we talked about earlier, I don't think we need to waste more breath on that.

The third one is that since the consistency-mechanism requires an explicit association-instruction to interleave the children's versions into its parent's (so that these versions can be exposed to nodes observing it from afar). If the child goes AWOL, it won't be able to associate its versions to its parent, so it won't keep up everyone else either. In this case the total-order won't be affected by the child-group's local-order, which is still allowed to make progress, as long as it's not trying to update any record that is distant to it.

sausagefeet · on March 13, 2024

1. I don't quite understand what a "consistency ledger" is and how it's not a database, but it sounds like a log. Many distributed systems solutions have logs, i.e. Raft. The question you haven't covered is how you keep the ledger consistent. I believe you're using Kafka underneath (correct me if wrong), but it doesn't matter so much, any queuing system is much under the control of CAP as anything else, so it's not clear to me how this resolves anything CAP related.

2. Redundancy in network connection. Yes this can help. Again, it doesn't resolve anything CAP related, it just reduces the likelihood of a certain class of distributed systems fails. Note, there are LOTS of ways to have unbounded latency that this does not resolve. Anything from misconfigured routers to disk drives dying. Again, not resolving anything CAP states, just attempting to reduce the some probabilities.

3. If I understand this correctly, you are saying some data is "homed" in certain regions, and if that region becomes partitioned from the rest of the world, they can still modify the "homed" data there. They own it. This doesn't address anything CAP related.

Assuming I understand the architecture correctly, then yes, some architecture's will benefit from this. Some might not. But all of these decisions align with existing understanding and decisions architects make in the face of CAP.

It feels to me that you have believe CAP defines a very particular database architecture (something like a PostgreSQL) database, and your architecture addresses limitations in that, and thus your architecture solves limitations of CAP. But that's just not true. Take Riak, Cassandra, BigTable, Spanner, CockroachDB, all of these represent architectures defined in the face of CAP and they all have different trade-offs. They don't look a lot like PostgreSQL. But they cannot get around the simple laws of physics for how information is communicated.

andras_gerlits · on March 13, 2024

The reason you don't understand why my claims are different from being just another solution with clocks and mutexes is because you haven't actually engaged with the essence of it. I'll give you a hint: the consistency mechanism is decoupled from collision resolution and that makes the consistency ledger both deterministic and inherently non-blocking. I don't want to go into more specific questions around the implementation, but I can tell you with certainty that we provide much better promises around progressing global state and sustained write-latency than anything else and our client-centric consistency guarantees local-only reads.

But this is about the solution, not the science. The science is basically unlimited, its only real restriction is around our budget.

If you want to understand how the client-centric consistency mechanism takes care of these things, I write about it on medium and Twitter all the time.

But again, I don't feel like you owe me your time or attention

sausagefeet · on March 13, 2024

I have read everything you've given me. If you really believe in what you're selling then I am absolutely the person to convince because if you can convince me you can convince anyone. All you have done is post links to the same two articles which multiple people have told you don't answer the question they've asked, and your only response is to accuse me of not engaging. C'mon, this is ridiculous. You have several people with their undivided attention and you aren't threading the needle.

andras_gerlits · on March 13, 2024

There's a mental leap you need to make before it clicks into place and I can't do it for you. I have several people who understand what I say and why I say it, but I get that this isn't an easy step to make. If you read the essay, you understand how we turn time into data. You also understand how we can construct partial and global hierarchy of orders, how these are both strongly ordered and still move independently and how observers can construct their own consistent view from the data available to them. There's formal proof in the science-paper that we can do consistency all the way to snapshot.

To summarise: we have a consistent system that works via one-way streaming, via redundant channels. Best possible cache-invalidation. Reads are both consistent and locally available. Upper-bound time of writes is predictable and doesn't suffer from blocking. I'm not sure what other improvements anyone could want from such a system, this is the best such systems can ever be. These improvements go well beyond the limits CAP sets out and it basically makes the whole argument moot.

sausagefeet · on March 14, 2024

This is gibberish.

tsimionescu · on March 13, 2024

No, by definition, a partition event is one in which some amount of the nodes are entirely non accessible by any other node in the cluster.

Even if there were two different US nodes, and only one node lost connectivity to the others, the problem wouldn't change at all - the branch which only has access to the disconnected node would still either send shipments to the wrong address or stop being able to read the status at all.

andras_gerlits · on March 13, 2024

We need to clarify first if we're talking about CAP or real life. CAP requirements are absurd. To quote Dominick Tornow from here:

https://blog.dtornow.com/the-cap-theorem.-the-bad-the-bad-th...

"Note that Gilbert and Lynch’s definition requires any non-failed nodenot just some non-failed nodeto be able to generate valid responses, even if that node is isolated from other nodes. That is an unnecessary and unrealistic restriction. In essence, that restriction renders consensus protocols not highly available, because only some nodes, the nodes in the majority, are able to respond."

Partition with a capital P doesn't really move me as our experiences contradict its assertions, as the model it bases its statements on is fundamentally broken.

So let's talk about partition in the real world. Practically speaking, partitions mean that some nodes experience high latency when communicating. If you need to update some record in a specific data-centre and you can only to talk to that place via a single channel and that channel is disrupted; yes, you're going to experience a slowdown or even halt. Nowadays however, even widely used consensus-algorithms can mitigate that, and the delinquent node will eventually be dropped from the group if it causes enough problems. We don't do anything very different in this regard. Since our deterministic ledger can be replicated across multiple data-centres (as no nodes in it create original information), and since observers will only rely on records the ledger already sent out and since the only reason you would need to talk to this ledger is if you need to modify a shared record with other observers, you can always pick a different ledger-instance, there are no "master copies" anywhere. Remember that we can stream time-information with the data, so the client can always calculate its "point in time" and reconstruct a (even globally) consistent view of the data it has.

Sure, if you choose to centralise all your data behind a single flaky connection, you're gonna have a bad time. The point is that the setup we built allows you to not need to do that and it does that transparently, behind SQL semantics.

tsimionescu · on March 13, 2024

No, the CAP requirements are not at all as absurd as that article claims.

For the specific quote you gave, that is an obvious assumption. A client only has access to some of the nodes in the distributed system. Of course we want any node to give the correct answer - the whole purpose is to reduce the burden on the client. The client is not responsible for searching all of the nodes. And note that the proof doesn't actually require that all nodes return the right answer - the contradiction is reached as long as all the nodes that the client has access to return the wrong answer.

Another bad claim in the article is that the proof of CAP requires that the partition is permanent. Maybe it's written like that for simplicity, but it obviously only requires the partition to be longer than the client's bound on response time. If the client is willing to wait an hour for a response, then any partition event that's two hours long will lead to the same conclusion. Since clients never have unbounded time to wait, and since partition duration is unbounded even if not permanent, then the argument still holds.

Also, major network outages that disconnect whole regions of the internet for hours from the rest of the world happen somewhat regularly (more than once a year). Whole AWS regions have become disconnected, ~half of Japan was disconnected for a few hours, Ukraine has been disconnected several times, etc. If you run a large distributed system for a significant amount of time, you will hit such events with probability approaching 1.

andras_gerlits · on March 14, 2024

I can only repeat what I told you earlier. Our distributed consistency model meets the SQL-standard's requirements for consistency and tolerates such outages. This is a fact.

CAP is a bad model for more reasons than the ones listed in that article. My favourite one is that it requires Linearizability, which nobody does in SQL. The disconnect when saying "SQL is not consistent" to me is just too much. CAP is based on a badly defined idea that comes from a presentation that was wrong in what it said.

That you need to tolerate outages of entire regions is a good argument to make in itself, there's no need to point at CAP. My answer to that is that as there's a way to define consistency in a way that allows for it to manage partition problems more gracefully, and that is the model we show. If you require communication to establish consistency and stream the changes associated with the specific timeslot at the same time, partition means that the global state will move on without the changes from the partitioned areas and that they will show up once the region rejoins the global network. While separated, they can still do (SQL-) consistent reads and writes for (intra-region) records they can modify.

tsimionescu · on March 14, 2024

Are you saying that it's possible for an SQL server to allow to successfully commit a transaction where you modify a record, and then in a subsequent query, return the old value of that record? I very much doubt this is true of any single-node SQL database.

In contrast, any distributed system has this property (or it may refuse the query entirely) in the face of partitions.

mrkeen · on March 13, 2024

Thanks for the link.

> https://blog.dtornow.com/the-cap-theorem.-the-bad-the-bad-th...

Especially:

> The “Pick 2 out of 3” interpretation implies that network partitions are optional, something you can opt-in or opt-out of. However, network partitions are inevitable in a realistic system model. Therefore, the ability to tolerate partitions is a non-negotiable requirement, rather than an optional attribute.

> CAP requirements are absurd

Yes! Literally. One would roundly ridicule someone who claimed to have met (or exceeded) those requirements.

andras_gerlits · on March 14, 2024

CAP means many different things. If you took the time to read what I have to say about it, you would know that I'm saying that we're beating the requirements Brewer sets out in his original presentation, where he introduces the concept of the C-A-P tradeoff. He's clearly wrong in what he says in the presentation, which is what we say we're beating. We can say this, because we're meeting the requirements for "C" there (DBMS-consistency) and because we don't suffer the trade-offs mentioned there. In fact, our system can be both available and partition-tolerant with a definition of "C" that matches the ones laid out in the SQL-spec, as the reads are always local. The SQL-standard doesn't mandate time-related availability guarantees for writes.

sausagefeet · on March 12, 2024

CAP says nothing about a "universal clock over the whole system". CAP is about the decision that has to be made in some unit of the system, it could be the whole system or it could be a bit, at the point of an operation. It's physics, there is no way around it. You can make different decisions on the semantics your system needs, but if you have two nodes that physically cannot communicate but need to be consistent for a client to move forward, the client cannot move forward. Full stop.

Could you please show a failure mode that this system can handle that CAP says is not possible?

andras_gerlits · on March 12, 2024

Sure. Here: https://medium.com/p/5e397cb12e63#04a5

sausagefeet · on March 12, 2024

I'm not sure if you gave the wrong link or not but this link doesn't describe any failure modes and how OmniLedger allegedly resolves them.

andras_gerlits · on March 12, 2024

This section discusses some failure modes (the first one is about the failure of a specific node): https://itnext.io/how-simple-can-scale-your-sql-beat-cap-and...

This section discusses latency-spike mitigation (which is how Brewer defines CAP colloquially): https://itnext.io/how-simple-can-scale-your-sql-beat-cap-and...

This section dissects the problem when trying to apply CAP to non-linearizable systems like SQL: https://itnext.io/how-simple-can-scale-your-sql-beat-cap-and...

Again, if you're not happy with the lack of scientific rigour in this technical article (ie: not science-paper), you can connect the dots in this one: https://www.researchgate.net/publication/359578461_Continuou...

mrkeen · on March 13, 2024

>> if you have two nodes that physically cannot communicate but need to be consistent for a client to move forward, the client cannot move forward. Full stop.

> This section discusses some failure modes (the first one is about the failure of a specific node): https://itnext.io/how-simple-can-scale-your-sql-beat-cap-and...

> In this setup, a copy of the node is replaceable by taking a (potentially days old) backup copy of it and replaying the events that happened since the time the backup was established.

The replaced node does NOT have access to the events that happened after the partition.

* If the replaced node serves up stale data anyway, you built an AP system.

* If the replaced node refuses to serve up stale data, you built a CP system.

* If you're pretending it has access to the latest events, there's no partition and you built a CA system.

andras_gerlits · on March 13, 2024

No. CAP requires linearizability for its definition. If your consistency-model moves with the network's ability to communicate, your strong consistency can progress even if you somehow manage to lose your redundant replicas. This is what the CAP-section is about:

https://medium.com/p/5e397cb12e63#04a5

This is the summary: "In other words: any distributed solution that fits the SQL standard can rightly claim that it scales SQL databases, and Brewer’s model can certainly accommodate a framework for that. His model however, is not the only kind of distributed SQL database that can exist, therefore his assertion that all distributed consistent systems must pick where they position themselves on his famous triangle is wrong. The system we explain here for example, is an exception. Formally: because our consistency model stays within the bounds of what the SQL standard allows and includes network communication; and informally because we can fine-tune latency variability according to the use-case of the specific datastore within the system and can even be reduced to only be a theoretical concern."

sausagefeet · on March 12, 2024

I've read all this and I saw no description of failure modes and operationally how they are resolved. "If a node disappears just replace it with a new one". Ok, how?

mrkeen · on March 13, 2024

To be fair, I think it's fine to ignore some of the implementation details about restarting a failed node. You can probably assume some kind of replicated log that all distributed systems use.

And you can also give the benefit of the doubt when allowing some number (less than the quorum) of nodes to fail (and letting them restart and catch up, etc.) while the system still makes progress ("CA mode"). After all, that's the point of distributing a system in the first place - there's no one master which can die and bring down the system.

But yeah, at this point I think OP is just going to keep implying that partitions don't happen or something...

andras_gerlits · on March 12, 2024

I don't think I ever heard people accuse our paper of not being rigorous enough, but more than happy to listen to specific problems with it: https://www.researchgate.net/publication/359578461_Continuou...

danbruc · on March 13, 2024

I completely read the first half of it, hoping that everything would eventually make sense, that the pieces would fall into place, but that never happened. I only skimmed the second half and everything seems to just become more and more incoherent. There are some recognizable underlying themes but nothing of it makes really any sense. If I would have to guess, I would guess that ChatGPT generated that gibberish. For large section I could at least imagine that the ideas are just way over my head, especially since I have never heard anything about promise theory and did not read the references. But page 16 really convinced me, that it all is just nonsense - how on earth do we suddenly and out of nowhere end up with differentials, Fourier series, and Heisenberg's uncertainty relation? And while the paper superficially looks sophisticated and scientific with all the symbols and notation, there is no substance behind it. The symbols are really only used for providing short labels for all kinds of things but they are [essentially] never used to relate anythings, let alone to derive or prove something.

andras_gerlits · on March 12, 2024

The CAP argument falls apart as soon as you decouple consistency from wall-clocks. Consistent systems only suffer from CAP limitations if they need strict serializability.

The point is that if you only look at the order of the data and not the wall-clock of some actor in the system, you can "calibrate" these separate ordering mechanisms together into a coherent whole and from that, you can build up all of the SQL guarantees.

Linearizability makes systems suffer because it tries to enforce a Newtonian model in a relativistic world. Order will naturally emerge faster when you're closer to the data than if you're farther. Measuring these with the same clock is what causes CAP, not some inherent property of distributed consistency.

mrkeen · on March 12, 2024

> The CAP argument falls apart as soon as you decouple consistency from wall-clocks

Hogwash

Five friends are seated at a restaurant, about to agree on a flavour of pizza via a simple majority quorum. Before they do so, the three girls excuse themselves to the restroom while the two boys remain at the table. The waiter arrives and asks what flavour pizza they'll have.

Do the two boys answer on behalf of all five friends, or do they wait for the girls to return?

No-one checked their own watch.

andras_gerlits · on March 13, 2024

Can I replicate people deterministically and control all their sensory inputs in this scenario?

heavenlyblue · on March 12, 2024

There's 0 relativity involved in distributed databases

andras_gerlits · on March 13, 2024

Yes, that's exactly the problem with the existing models and why CAP was formalised.

heavenlyblue · on March 16, 2024

Nope, that's not true, and you neither solved nor explained it in this message

andras_gerlits · on March 12, 2024

I also have a demo here where I show transactionality between two SQL-databases, a MySQL and a Postgres instance: https://www.youtube.com/watch?v=XJSSjY4szZE

And another, where I show loose coupling, ie: that the system continues even when an instance goes down and that it catches up once restarted. https://www.youtube.com/watch?v=R4_phLs4d_M

mrkeen · on March 13, 2024

> I also have a demo here where I show transactionality between two SQL-databases

If you're not constrained by CAP, I'd hardly expect you to be constrained by Two Generals either.

> And another, where I show loose coupling, ie: that the system continues even when an instance goes down and that it catches up once restarted

I've built one of those, it's fun! But I digress. Back to CAP.

1) How many instances do you have?

2) What is your quorum (simple majority?)

3) How many instances can go down before the system stops serving consumers?

sausagefeet · on March 12, 2024

No-one doubts if you put a message queue in front of your database, you can do what these videos show. The doubt is if this says anything interesting about distributed systems. At least these videos don't demonstrate that.

andras_gerlits · on March 12, 2024

Sure, there are other ways of doing this, the demos don't prove what I say, they only show that the theory works on a practical level. The science-paper however, does show how this mechanism can scale consistency.

Think about it this way: Our system simplifies running inter-system consistency and makes it much faster. Considering that Spanner exists, what proof would you accept to validate our claims?

I understand that reading all the material and putting them together isn't a small ask, but no new tech is easy to understand at first. Anyway, the material is there for anyone who cares to look and the system does what we claim it does.

I don't think anyone owes us their time to check our claims, but I don't think the fact that not lot of people will do this changes anything either.

sausagefeet · on March 12, 2024

We already know this technique works, it's how any asynchronous database replication is implemented.

I've read all of your content and I can see nothing even close to, for example, the Spanner or Amazon Dynamo paper which go through the operation details of how the systems work. Literally your articles are just a bunch of metaphors followed by hand waving. No operational details.

I don't know if you know you're selling snake oil or just don't understand what you're implementing, but even in this thread you've gotten plenty of feedback that how you describe what you're doing is not coherent, you might want to address that. Or not, I don't know, if you're selling like hot cakes then keep doing it.

andras_gerlits · on March 12, 2024

It's fine to not understand things. Being condescending is a natural reaction to feeling challenged, I really do understand where you're coming from. It's not even your fault, really. Anyway, when you're ready, feel free to pick it up again, the material doesn't take things personally and neither do I.

sausagefeet · on March 12, 2024

Or I guess just not responding to any of the technical questions that have been brought to you. That's an option too.

andras_gerlits · on March 12, 2024

It's ironic that so many of you are missing the rigour, as that is exactly the thing that "undoes" the CAP arguments.

Anyway, if math is what you guys are missing, it's in this science-paper linked in the article: https://www.researchgate.net/publication/359578461_Continuou...

This is a gentler intro to the concepts. You can also read my essay on why this setup works better than the often used semantics: https://medium.com/p/5e397cb12e63

There's a specific section at the end on why CAP only applies to a very specific subset of SQL databases.

sausagefeet · on March 12, 2024

Math isn't what's missing, but Mark's post is just a bunch of metaphor and no rigor. At the very least it could go over failure modes and shows how it alleviates them but other databases fail.

andras_gerlits · on March 12, 2024

We do you one better. We show how all information can be made redundant via determinism and how that means you can supply multiple copies of them across parallel, redundant channels.

My essay talks about this in detail around the latency-mitigation section and the failure-modes part, but these are questions much closer to the actual implementation. Mark discusses the new mental model behind it, I talk about technology.

https://itnext.io/how-simple-can-scale-your-sql-beat-cap-and...

sausagefeet · on March 12, 2024

This post has no rigor. "What happens if a node dies? Well another takes its place". Ok, show me. This is hand waving.

andras_gerlits · on March 12, 2024

How to decentralise and scale distributed consistency beyond the commonly accepted limits

andras_gerlits · on Feb 24, 2024

How we built the world's first loosely coupled, ACID, master-master platform that can accommodate any mainstream SQL-database.