- Abstract time (all timeouts etc.) in the DBMS, so that time can be accelerated (roughly by 700x) by ticking time in a while true loop.
- Abstract storage/network/process and do fault injection across all the storage/network/process fault models. You can read about these fault models here: https://docs.tigerbeetle.com/about/safety.
- Verify linearizability, but immediately as state machines advance state (not after the fact by checking for valid histories, which is more expensive), by comparing each state transition against the set of inflight client requests (the simulator controls the world so it can do this).
- But not only check correctness, also test liveness, that durability is not wasted, and that availability is maximized, given the durability at hand. In other words, given the amount of storage/network faults (or f) injected into the cluster, and according to the specification of the protocols (the simulator is protocol-aware), is the cluster as available as it should be? Or has it lost availability prematurely? See: https://tigerbeetle.com/blog/2023-07-06-simulation-testing-f...
- Then also do a myriad of things like verify that replicas are cache-coherent at all times with their simulated disk, that the page cache does not get out of sync (like what happened with Linux's page cache in Fsyncgate) etc.
- And while this is running, there are 6000+ assertions in all critical functions checking all pre/post-conditions at function (or block) scope.
And please come and join us live every Thursday at 10am PT / 1pm ET / 5pm UTC for matklad's IronBeetle on Twitch where we do code walk throughs and live Q&A: https://www.twitch.tv/tigerbeetle
You can read the code here: https://github.com/tigerbeetle/tigerbeetle/blob/f8a614644dcf...
This does things like:
- Abstract time (all timeouts etc.) in the DBMS, so that time can be accelerated (roughly by 700x) by ticking time in a while true loop.
- Abstract storage/network/process and do fault injection across all the storage/network/process fault models. You can read about these fault models here: https://docs.tigerbeetle.com/about/safety.
- Verify linearizability, but immediately as state machines advance state (not after the fact by checking for valid histories, which is more expensive), by comparing each state transition against the set of inflight client requests (the simulator controls the world so it can do this).
- But not only check correctness, also test liveness, that durability is not wasted, and that availability is maximized, given the durability at hand. In other words, given the amount of storage/network faults (or f) injected into the cluster, and according to the specification of the protocols (the simulator is protocol-aware), is the cluster as available as it should be? Or has it lost availability prematurely? See: https://tigerbeetle.com/blog/2023-07-06-simulation-testing-f...
- Then also do a myriad of things like verify that replicas are cache-coherent at all times with their simulated disk, that the page cache does not get out of sync (like what happened with Linux's page cache in Fsyncgate) etc.
- And while this is running, there are 6000+ assertions in all critical functions checking all pre/post-conditions at function (or block) scope.
See also matklad's “A Deterministic Walk Down TigerBeetle's Main Street”: https://www.youtube.com/watch?v=AGxAnkrhDGY
And please come and join us live every Thursday at 10am PT / 1pm ET / 5pm UTC for matklad's IronBeetle on Twitch where we do code walk throughs and live Q&A: https://www.twitch.tv/tigerbeetle