> Where Jepsen will break most databases with network fault injection, we test TigerBeetle with high levels of storage faults on the read/write path, probably beyond what most systems, or write ahead log designs, or even consensus protocols such as RAFT (cf. “Protocol-Aware Recovery for Consensus-Based Storage” and its analysis of LogCabin), can handle.
Oh, nice one. Whenever I speak with people who work on "high reliability" code, they seldom even use fuzz-testing or chaos-testing, which is... well, unsatisfying.
Also, what do you mean by "storage fault"? Is this simulating/injecting silent data corruption or simulating/injecting an error code when writing the data to disk?
> validate all fields for semantic errors so that we don't process bad data,
Ahah, so no deserialization doesn't mean no validation. Gotcha!
> In our experience, zero-deserialization using fixed-size structs the way we do in TigerBeetle, is simpler than variable length formats, which can be more complicated (imagine a JSON codec), if not more scary.
That makes sense, thanks. And yeah, JSON has lots of warts.
Not sure what you mean by variable length. Are you speaking of JSON-style "I have no idea how much data I'll need to read before I can start parsing it" or entropy coding-style "look ma, I'm somehow encoding 17 bits on 3.68 bits"?
> Also, what do you mean by "storage fault"? Is this simulating/injecting silent data corruption or simulating/injecting an error code when writing the data to disk?
Exactly! We focus more on bitrot/misdirection in our simulation testing. We use Antithesis' simulation testing for the latter. We've also tried to design I/O syscall errors away where possible. For example, using O_DSYNC instead of fsync(), so that we can tie errors to I/Os.
> Ahah, so no deserialization doesn't mean no validation. Gotcha!
Well said—they're orthogonal.
> Not sure what you mean by variable length. Are you speaking of JSON-style "I have no idea how much data I'll need to read before I can start parsing it"
Yes, and also where this is internal to the data structure being read, e.g. both variable-length message bodies and variable-length fields.
There's also perhaps an interesting example of how variable-length message bodies can go wrong actually, that we give in the design decisions for our wire protocol, and why we have two checksums, one over the header, and another over the body (instead of one checksum over both!): https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...
But Zig is the charm. TigerBeetle wouldn't be what it is without it. Comptime has been a gamechanger for us, and the shared philosophy around explicitness and memory efficiency has made everything easier. It's like working with the grain—the std lib is pleasant. I've learned so much also from the community.
My own personal experience has been that I think Andrew has made some truly stunning number of successively brilliant design decisions. I can't fault any. It's all the little things together—seeing this level of conceptual integrity in a language is such a joy.
Oh, nice one. Whenever I speak with people who work on "high reliability" code, they seldom even use fuzz-testing or chaos-testing, which is... well, unsatisfying.
Also, what do you mean by "storage fault"? Is this simulating/injecting silent data corruption or simulating/injecting an error code when writing the data to disk?
> validate all fields for semantic errors so that we don't process bad data,
Ahah, so no deserialization doesn't mean no validation. Gotcha!
> In our experience, zero-deserialization using fixed-size structs the way we do in TigerBeetle, is simpler than variable length formats, which can be more complicated (imagine a JSON codec), if not more scary.
That makes sense, thanks. And yeah, JSON has lots of warts.
Not sure what you mean by variable length. Are you speaking of JSON-style "I have no idea how much data I'll need to read before I can start parsing it" or entropy coding-style "look ma, I'm somehow encoding 17 bits on 3.68 bits"?