Hacker Newsnew | past | comments | ask | show | jobs | submit | scoopertrooper's favoriteslogin

> If you want database semantics instead of document semantics, as far as I know nobody has done this well on top of CRDTs yet.

This is very interesting to me. CRDTs for databases sounds fantastic. Databases have much much richer data types and transactional semantics than collaborative text editing, and this makes applying CRDTs to databases harder.

Let's start with the basics. CRDTs are all about designing or picking monoids that fit the problem and allow one to get a very good approximation of the semantics one is after, if not even exactly.

What does that mean for databases? Well, for one, every data type in a database will have to have a monoid associated with it -- this is... limiting, but limiting is good if the benefit is that we don't need any more complex mechanisms to get convergence in a distributed database. For example, we can have a table of "likes" where they can only increase additively, so we'll make that BIGINT with the ADDITIVE monoid or whatever. But `BIGINT with the ADDITIVE monoid` is the easy stuff. The hard part is PRIMARY and UNIQUE KEYs.

So what about unique keys? Well, we can have monoids for those too. Like tiebreakers based on timestamps. Especially since we have delineated transactions (BEGIN; ..; COMMIT;) we know that if some INSERT fails eventually then the whole transaction containing it fails (unless that INSERT is of the OR IGNORE / ON CONFLICT DO NOTHING variety).

The real problem with CRDT and databases is that CRDT is incompatible with SQL transactional semantics. Similarly for CRDT and filesystems and POSIX semantics. You can't tell if some transaction will commit successfully until you've heard from all collaborators enough to know that it must have. Instead you can consider every local transaction that succeeds locally as committed, but then later every possible conflict has to be resolvable in some way. This gets tricky real fast. It might be easier to start with "trivial" transactions like POSIX file rename(2). If two applications decide to rename the same file to different names, and both observe success, and eventually only one of those renames succeeds, then it has to appear to the loser that the winner came along and renamed the file after the loser did. This sequence of events:

  loser                        winner
  -----                        ------
  rename("/a/b", "/a/c") = 0
                               rename("/a/b", "/a/d") = 0
would have to look to the loser like:

  loser                        winner
  -----                        ------
  rename("/a/b", "/a/c") = 0
                               rename("/a/c", "/a/d") = 0
                                          ^
                                          |
                                          /
                   note the difference --+
But, of course, POSIX has something to say about this, and that is "nope!". POSIX says "no" because if the loser and the winner share notes they'll find that what appears to the loser to have happened is not at all what happened. There are very specific rules about observability of writes, and order of events, that POSIX has that a CRDT distributed filesystem simply must break. The loser's successful rename(2) call should have been visible to the winner, so the winner should have lost the race to rename that file. Still, it's a pretty good compromise for the benefits of CRDT.

Similarly for O_CREAT | O_EXCL: the loser can imagine that the winner unlinked the file created by the loser then created a replacement. Again, we've left the land of POSIX at that point, but again, it can be a pretty good compromise for some applications.

SQL transactions are infinitely more complex than POSIX ones, but I think the analysis of the POSIX case generalizes to SQL ACID transactional semantics: you can't quite have that with CRDT. You might find more of some successful-looking-to-you transaction undone or changed later by a collaborator's in ways that, if you were to look at the actual events, you'd be annoyed violate ACID. And again, for some applications this probably just won't work. Though, too, we might come up with monoids that help us reach acceptable semantics.

For example, in Active Directory each domain controller gets its own pool of RIDs (relative identifiers) for assignment to new users and groups and machine accounts, so there can be no conflicts about those, but there is no sub-namespacing of the names of those things, so the rule AD uses is that the loser's user/group/namespace gets renamed to something like "copy_of_{original_name}". Also, AD requires a strong primary role for the DC that hands out RID ranges for allocation, which again means we're leaving the land of CRDT for some things. So in AD one might find a user/group/machine unexpectedly renamed like that (but it rarely ever happens).

But if the application was using atomic transactions to decide whether to have destructive external side-effects, such as "sell!!!", "launch missiles", etc., then this kind of shifting sands transactional semantics may be... unsatisfying.

CRDT resembles eventual correctness in a way. It's not that local state is ever incorrect, but that a sequence of events is impossible in a system with traditional ACID transactional semantics.

All this said, just note that I've done zero work in this area, just lots of thinking. I've this idea that one could use PG logical replication publications and subscriptions, and monoid choices encoded in COMMENTs, and suitable functions, to implement a CRDT scheme on top of PG to explore this space. Basically, each collaborating server publishes its view of a log of local transactions and subscribes to all the others' (in separate schemas), then periodically it applies the others' logs to the local source of truth. For simple applications that might even suffice instead of a database that natively supports CRDT. I have done work on encoding useful schema metadata in COMMENTs using JSON, so I'm pretty confident that this is something worth exploring.

One thing that is clear is that some of this space has been explored already. For example, again, AD has done so for PRIMARY/UNIQUE KEYs! Indeed, AD even implements a hybrid approach to multi-mastering a distributed database, with an "infrastructure master" for some things, and CRDT for others. AD's metaschema is LDAP's, with some enhancements (especially an ObjectDN syntax for "relations" or "pointers") that make it a lot more like a relational database. And AD is a general purpose database that achieves these things. So it's not like this space is completely new -- there are a few giants' shoulders to stand on.


Geeking out about voting systems is great. Big fan.

Even better are voting systems in context. Voter education, cost of administration, quality of outcomes, cultural impact.

For examples:

- Approval voting has the best balance between simplicity, fairness, and certainty. (For single winner contents.)

- The multiple choice systems reduce partisanship, negativity. Because all candidates want to be your second choice.

- Eliminating primaries will reduce polarization and voter fatigue.



I have done it several time professionally (Motif, Qt3, GTK, etc, but not wx) to QtWidgets or QtQuick. It's been a while and I pivoted to other industries.

It's a potential project killer, but it can be done. The trick is to wrap all your objects in QObject classes. You keep the old UI code as-is as long as possible.

The first Jedi mind trick is to reverse the event loop. Qt supports GLib2 loops, but only as long as they are created by Qt. So you create a dummy QCoreApplication, start the event loop. Then you coerce GTK into using the glib event loop Qt created.

After that, you populate your QObjects I mentionned above. The `Q_PROPERTY` stuff is the important part. It *has* to be fully implemented for QML to ever work. For Qt5, you need to use Qt containers. This means porting your lists and other C++ types (or glib types) to Qt. For Qt6, I think you can get away with some of that, but not Qt5.

Then, make all the signals work. It is important that all properties have signals. The signals must be reliable. QML heavily relies on events for literally everything. Without events, your UX gets out of sync and crash.

The next part is models. Turn all list with selections to models. Don't try to render stuff using for-loops in QML, this is a terrible idea and will explode in your face. Qt Models are annoying to write (and get repetitive very fast), but reliable list models really greatly simplify the QML code.

When you have all those QObject to reflect your current codebase, you make a companion app. This is a small window (or LAN/Network smartphone UI using QtRemoteObjects).

Now you are ready for the UI. Add little features, one by one. Don't try to do everything at once. Again, as long as you can, keep the wx/GTK window alive and fully functional. Your app will look wierd with 1 wx window and 1 QtQuick window. However, having the fully functional old UI let you press buttons there and see the results being updated in QML. That let you polish/debug your signal/q_property code step by step.

Once you are "done" with your port, strip the UI code from the old codebase. Then refactor/clean the logic and turn it into a library. Then start merging your QObject shims and logic classes into a single class.

If you get there, congradulation, your port is complete. Time to annoy your users with a bunch of changes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: