“Stateless” sync using version vectors

allenu · on Sept 4, 2018

I solved a similar problem in my to-do app by just having a journal of diffs per each device. To "sync" you just upload your device's journal to a shared file store (say, Dropbox) and then download all the other devices' journals. You merge all the diffs to get the "latest" state of the database. Conflict resolution is last write wins, but at the individual property level (two devices can update a to-do task and update different properties).

This actually works perfectly fine for my needs because a single user is not likely to be modifying two tasks across two devices at the same time. It may actually work for this project as well (since it's single user). You can check out the code here: https://github.com/allenu/slouchdb

kstrauser · on Sept 5, 2018

BTW, that strategy is known as "operational transformation". Here's an article about it if you're interested in some of the science behind it: https://en.wikipedia.org/wiki/Operational_transformation

marknadal · on Sept 5, 2018

Not entirely, OT does intention resolution on a centralized server - while the described method doesn't need a resolution server.

So in that regard, the above description is more elegant, yet somewhat irrelevant to text/rich-text merging that OT does.

You can trivially extend the above description to handle multi-user functionality by:

- Putting an upper bound on the time skew to (A) increase resolution of sync while (B) not needing to trust that clock drift is non-trivial.

- Using a state machine to resolve updates outside the bounds of the current machine's current context (including any erroneous clock drift).

- Running a P2P NTP algorithm to reduce clock drift where possible, without needing intermediary servers.

- Deterministically fallback to a naive algorithm, like lexical sort, if & only if updates collide on the same vector on the same timestamp. This is important as a lot of CRDTs don't handle conflicts that happen on vector collisions.

We've learned a lot implementing these things in production (it runs on Internet Archive, plus a lot of "dApps" that are doing terabytes of daily traffic) with our system - a good overview is a "comic strip explainer" I did that explains more thoroughly the reasoning behind thes various choices ( http://gun.js.org/distributed/matters.html ).

SiempreViernes · on Sept 5, 2018

I find the description of state transition and Einstein's relationship to quantum mechanics distorted to the point of being offensive!

Why would anyone take the gross simplification that is the orbital picture and then have a problem with particles physically moving between the levels? Why invent this silly notion of "teleportation", if you want quantum mechanics to get spooky just draw the proper probability clouds for Dirac's sake!

kstrauser · on Sept 5, 2018

Because my last reply to this got flagged (perhaps because "Wat?" wasn't specific enough):

What does this have to do with anything we're discussing? Did you accidentally post your reply in the wrong tab?

maxxxxx · on Sept 4, 2018

"Conflict resolution is last write wins, but at the individual property level (two devices can update a to-do task and update different properties)."

Depending on your use case property by property conflict resolution works great but it can also go horrible wrong with some data. Do you have timestamps on each property?

heavenlyblue · on Sept 4, 2018

I would assume every diff has a timestamp. And since you can diff properties separately then it works correctly.

I would be curious about another scenario: what about several devices generating a new password for a website? Or instead - creating a note with the same name?

The one that is going to be saved is the last one - that is true. But it implies there's some transparent data loss in the background (you could still get those from the log, of course).

How do you deal with those "soft" conflicts in the UI? Is there any learning curve for the user to accept that any changes could be rolled back? How do you display that so that the user wouldn't assume the data was lost?

jayd16 · on Sept 4, 2018

Hmm, something nice about this change log solution is that merge is built in by design. You could very happily default to all new collection items start with a UUID despite things like user definable item names.

Normally this might be annoying but with merge as a first class citizen its easy to add a "merge these items" feature. Not only is there no hidden data loss, but it also adds the nifty feature of being able to merge totally unique item.

allenu · on Sept 4, 2018

With my system, I create a UUID for each entry, so I'm guaranteed uniqueness there. The other properties (task name) can be whatever you want.

You're right that it gets even more tricky if you are doing key=>value pairs and are expecting that the key is something that may not be unique across devices (i.e. two devices using the same website domain as they key would lead to a conflict).

Normal_gaussian · on Sept 4, 2018

With respect to the notes of the same name - you can key the items with a generated id and "namespace" the keys with a journal specific id to avoid conflicts. In this way two notes having the same name is 'ok' as they have different keys.

It would help to have a user guided merge feature that can be manually triggered by combining items (not just on conflict)

allenu · on Sept 4, 2018

This is exactly what I'm doing with my solution.

allenu · on Sept 4, 2018

Yes, every diff has a timestamp. The clocks aren't synced across devices, but the assumption is that they are "close enough" such that a user isn't going to do something like mark a task as done on device A and within the clock delta jump to device B and mark it as "not done".

Conflict resolution falls apart if there are multiple users on multiple devices, but my assumption with my solution is that it's only to be used by a single user across multiple devices.

bradknowles · on Sept 4, 2018

How accurate is your timesync? Do you use NTP across the board, and if so, how often do you check with your upstream clock servers to see if you might need to make some minor adjustments?

What happens if two different devices try to make updates within the window of your timesync error? Who wins then? Because they both think that they're the latest and therefore their write should win.

allenu · on Sept 4, 2018

I don't do anything like that. I just rely on the device's clock. Again, for my simple needs (to-do list), it works fine. It could be a problem if you have a device with the wrong time, but even then it would require a single user to be making lots of changes on the same property of the same object within a very small window of time, which for a to-do app is very, very unlikely. I didn't want to solve problems I didn't have.

maxxxxx · on Sept 4, 2018

Sounds like you created a simple, pragmatic solution for your use case. Which is exactly what you should do. But the method probably can't be applied to mission critical systems without changes.

allenu · on Sept 4, 2018

My solution was designed for a very narrow use case where there is one user and multiple devices. The main reason I designed it because I did not want to stand up a cloud service for storing user info. I wanted to offload that responsibility to a file sharing system like Dropbox or OneDrive. It's not meant for anything beyond a very simple use case.

chrisweekly · on Sept 5, 2018

... which is what makes it appropriate! Pragmatism FTW.

ttflee · on Sept 5, 2018

By latest, do you mean the latest in term of client time or server time?

CGamesPlay · on Sept 4, 2018

If I'm imagining this correctly, it's a lot like a Redux application flow, but with partially ordered Actions and a last-write-wins conflict resolution. I need to take a deeper look at the CRDT explorations link, but is this close to a fair analogy?