It’s enough, just make sure you’re not trying to fly an airplane or prevent the ...

bradleyjg · on June 2, 2024

It’s enough, just make sure you’re not trying to fly an airplane or prevent the detonation of nuclear weapons using a database, it’s the wrong tool for the job.

Your intuition should be the opposite. You should always reach for correctness first and only sacrifice it after careful consideration when performance demands require it.

_3u10 · on June 4, 2024

If your intuition is to put a doubly linked list in a database I don’t know what to say…

It’s so innerplatform effect that I’m at a loss for words.

roncesvalles · on June 2, 2024

>It’s enough, just make sure you’re not trying to fly an airplane or prevent the detonation of nuclear weapons

My gripe with this kind of argument is that today, you aren't. You can write application-side duct tape to deal with any kind of wonky database situation, but the whole point of having a database with strong and easy-to-reason guarantees is that it makes future development easier.

_3u10 · on June 4, 2024

He’s talking about performance and then using a database. It’s the wrong tool for the job.

bsaul · on June 2, 2024

OP provides even more simple examples than double linked lists. It made me realize how pg default isolation level was actually quite a nice footgun and reread the doc about the various isolation levels much more carefully.

denton-scratch · on June 3, 2024

> If you’re storing doubly linked lists in a DB you’re doing it wrong.

This was my reaction on finding that TFA's key example is a doubly-linked list. I've never implemented any kind of linked list in a database; nor have I ever come across someone else's schema that involved linked lists. The kinds of operation you do on linked lists (traverse, insert, append, delete) all involve sequences of row accesses that can't be (conveniently?) described in SQL, so they have to be expressed as multiple distinct accesses.

More generally, I'm suspicious of a schema in which a table contains a "foreign" key to itself. Foreign keys are keys to other tables.

I haven't given it any thought; but could there have been a better example? Or is updating a doubly-linked list the best illustration of why SI is dangerous?

cryptonector · on June 3, 2024

Relations form graphs, of which lists are a subset. It's very easy to end up with graphs [in SQL] that can be broken as in TFA.

sam_bishop · on June 2, 2024

> If you’re storing doubly linked lists in a DB you’re doing it wrong.

Assuming that the database uses B+ trees (like most do), then the database records themselves are very likely to be in a doubly linked list.

Not every doubly linked list is the kind you see in an introductory data structures class.

jlokier · on June 3, 2024

Yes, this. Because of this, I saw substantial data corruption that had built up over years in live medical databases: The on-disk B+trees had corrupted internal links due to incorrect handling of concurrent updates and rollbacks. This made some queries return incorrect results.

The developers had been writing messy heuristic workarounds for database low-level corruption in application code for over a decade, instead of figuring out the cause and fixing that.

For example, application code had special routines to attempt to detect and ignore duplicate records in some queries (but not others), in response to customer bug reports, but of course that didn't fix missing records or make the data correct. It just patched over an observed symptom. It also didn't fix all the places it could happen, so much of the application still had occasional flaky behaviour, but not consistently enough to be reported in detail and get the (bad) workarounds to hide it.

Correctly implemented transactions are very helpful for database internals and on-disk structures too, not just for the API level presented to applications.

paulddraper · on June 3, 2024

> It’s enough

Surrender consistency only in the gravest of necessity.

No one likes their data corrupted.

projektfu · on June 3, 2024

So don't use it for writes. Snapshot is good for analysis. Let's say you want to show the user a report of their sales by category, with some additional detail. You want to load those things as of the same cut off time because one of the detail queries could be different from the summary query if you didn't. This is what people expect in their results.

If you really need a query for modifying data that depends on itself, you should read it exclusively, i.e. serializable.

senderista · on June 3, 2024

If you don't need your snapshot read-only queries to be linearizable, then you don't need MVCC for this: just take a consistent checkpoint every second or minute or hour. A simple way to implement low-frequency snapshots is what Redis made famous: fork the server process so all future updates in the parent are to CoW pages and read the snapshot from the child. (HyPer did this originally but I'm not sure they still do.)

I think this approach makes more sense than MVCC when linearizable abort-free read-only transactions aren't really necessary: you can avoid a lot of complexity and a lot of overhead by not retaining old versions.

unnah · on June 3, 2024

But those Redis snapshots are only used for regular back-ups, so completely unlike snapshot isolation in a relational database, right?

senderista · on June 3, 2024

I was referring to using recent database snapshots for read-only queries. HyPer used the same snapshotting technique as Redis backups for this purpose:

https://cs.brown.edu/courses/cs227/archives/2012/papers/olap...

unnah · on June 4, 2024

Ok, thanks. I was wondering because last year I spent a few hours poring through Redis documentation, trying to figure out how to get access to read-only snapshots...

macintux · on June 2, 2024

Good luck troubleshooting why your data is subtly (or not-so-subtly) corrupt when the database transactions that started the problem occurred months previously, because the guarantees were too loose/not well-understood.