Yes, but that's natural conflict. One client can do something that the other client doesn't like, and they can go back and forth. The system can't prevent that because it's a natural part of collaboration.
And note that this only applies to the interactions between local operations and server operations. If a client does a bunch of local work and only a small component of it is tied to a previous operation (e.g., one edge in graph), then that doesn't mean the rest of the work gets dropped. That is, if I create a bunch of new vertexes and edges, and only one of those edges connects my new vertex to a vertex already present in the server's operations and that connection is deleted simultaneously by another client, then I don't lose all of my work---I just lose that one edge. The rest of my graph is in tact.
And of course, if this is an end user application (like, say, Google Docs), then the user should be able to view the history and undo what their collaborator did. But I don't see this as any different than editing a document on Google Docs where one collaborator keeps deleting stuff you did.
> Yes, but that's natural conflict. One client can do something that the other client doesn't like, and they can go back and forth. The system can't prevent that because it's a natural part of collaboration.
But this isn't necessarily true. The point is there's more concurrency and collaboration possible than what you describe, if you restrict the operations in specific ways. This is what CRDTs and CRs do.
> But I don't see this as any different than editing a document on Google Docs where one collaborator keeps deleting stuff you did.
It is different. The collaborator actually saw your changes and decided to remove them. This is very different than your changes arbitrarily being lost by the system itself because its merge behaviour is insufficient.
Hmm, OK, I guess I'm kind of lost then. In my mind, the system I described never loses anything. I'm OK nipping this convo because I kind of feel like we're going in circles, but if you do want to continue I'm not sure how because I don't quite understand where we're misunderstanding each other. Maybe show an explicit example of the kind of data loss you're talking about?
My best guess is that I've described my design poorly and I'm leaving something critical out. I think the only way for me to figure out what I've left out is to hear more of your objection. :-)
Not sure if I can respond properly on mobile but I'll do my best.
1. There are differences between system conflicts and user conflicts.
Let's say I'm collaborating on a Google doc. Both my collaborator and I start writing in the same part of the document. If the concurrent system is designed "poorly" (for example, it has last write wins semantics), then only one of our writing appears. You can store this history in the system and show that your changes were overwritten by their changes, but this can be frustrating if you are both collaborating on the end of the document.
2. Commutative operations give better guarantees.
Let's take for instance an append only DAG. We restrict this DAG to having 2 operations: creating a vertex (and associated edges) between 2 existing vertices and creating an edge between 2 existing vertices. In this case, even if both my collaborator and I add a vertex between 2 existing vertices, the system will converge on a DAG where both vertices are added. You can follow the same argument with adding edges.
This DAG is an example of a CRDT. It has a limited set of operations (addEdge and addBetween) which allow convergent semantics through concurrent operations
Happy to chat more about this stuff and hope my mobile response makes sense heh.
So w.r.t. to (1), yeah, I totally get that. And I agree that is a hard problem to solve in the context of document editing (or more precisely, anything that contains a sequence). But for a graph, the logic is considerably simpler because a graph is just two sets. So if the system "loses" work that the client has done, it's only when some other client has deleted or changed something. So if two clients are editing the same graph and one drops a vertex and the other creates an edge that connects to that vertex, what is the right outcome in that circumstance? It seems to me like the edge gets deleted, and that this is a natural user conflict, no? (Regardless of how you order the operations in this case, the edge gets dropped.)
W.r.t. to (2), yeah, sure, but if clients need to be able to remove things then you lose commutativity and you end up missing the requirements of the system itself. Or is there some other component of a CRDT system that's supposed to handle this? (My big picture takeaway from the OP paper was that it was specifically trying to handle the non-commutative operations.)
Yup! Removes are where a lot of these things get tricky. The typical strategy is as you say, removes take precedence over adds, so if one client adds vertices/edges onto a vertex that gets removed by another client, when the DAGs converge, the remove gets precedence and "drops" all edges/vertices attached. It seems like we're basically saying the same thing.
Removes are also tricky because if you're just transmitting the remove operation itself, you have to ensure that updates appear to the client in order, or else removes and adds could conflict, causing inconsistent state.
Yeah, in practice, each client maintains a persistent data structure, where each "mutation" is the application of a single operation. If a new operation comes in that occurs before other operations that have already been applied to the persistent data structure, then the client rewinds the structure to its state just before the new out-of-order operation, applies that operation, and then applies the rest of the operations that it already has. (I think of this as very similar to a `git rebase`.)
But yeah, this might not work at all for sequences. And of course, there are other issues with respect to the memory requirements of the persistent data structure in the client. Some sort of compaction is probably necessary.
And note that this only applies to the interactions between local operations and server operations. If a client does a bunch of local work and only a small component of it is tied to a previous operation (e.g., one edge in graph), then that doesn't mean the rest of the work gets dropped. That is, if I create a bunch of new vertexes and edges, and only one of those edges connects my new vertex to a vertex already present in the server's operations and that connection is deleted simultaneously by another client, then I don't lose all of my work---I just lose that one edge. The rest of my graph is in tact.
And of course, if this is an end user application (like, say, Google Docs), then the user should be able to view the history and undo what their collaborator did. But I don't see this as any different than editing a document on Google Docs where one collaborator keeps deleting stuff you did.