Regarding the last point << Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign. >>
This is due to a combination of factors-- the choice of implementation language (C++), a variety of enhancements to RocksDB, consistent reads from leaders (using leader leases) rather than doing a 3-way quorum read, etc. We discuss these aspects in more detail here:
As Kyle mentioned << Because these problems involve schema changes (e.g. creating tables), they may not impact users frequently. YugaByte doesn't think they're relevant to the core transactional mechanism in YugaByte DB, which is why they're not discussing them when they say "Jepsen tests passed". >> the impact of this is very limited, and not core to the transactional mechanism in YugabyteDB.
Most "Distributed DB" vendors do not support transactional DDL yet to our knowledge, and haven't been subjected to this specific test. In any case, I have updated the post/blog to clarify this:
<< Given that DocDB, Yugabyte DB’s underlying distributed document store, is common across both the YCQL and YSQL APIs, it was no surprise that YSQL passed official Jepsen run safety tests relatively easily (with the exception of transactional DDL support, which almost no other distributed SQL database vendor supports, and we plan to support soon. The real-world impact of this open issue is really small as it is limited to cases where DML happens before DDL has fully finished). >>
To add some context: every Jepsen test involves table creation. YugaByte DB's table-creation process was exceptionally fragile, which is what prompted tests specifically pushing on that behavior.
There is a world of difference between not having a common and necessary feature and not having feature nobody else has neither. If project tries to establish itself as a serious competitor in a given field, communication is crucial. “We don’t have it but we’re working on it and softwares you currently use probably doesn’t have it” is way different message than “we miss something you probably use”.
Ehm yeah. If it was groundbreaking. But there are 5+ other contenders in this field who are dealing with the same issues, and in some cases are fairing better. We are currently evaluating multiple NewSQL vendors, and it really does come down to the details making or breaking the case. I am not sure what potential NDAs I am on so I can't share details, but there is a sharp difference in one company and another claiming "Distributed Serializability". Cockroach for instance enforces a lot of stuff to maintain consistency, and as a result can be (or is) slower. But at least it's also predictable. In the end it's all trade-offs and I actually like the Yugabyte product a lot. I just wish they are more transparent about what choices they made and the impact of that
Thanks for your feedback! Not sure when you tried yugabyteDB, but our serializable isolation level and YSQL API (which is needed to exercise serializability) were in beta till a couple of days ago. That said, if you can share some feedback, that would help us out immensely. All kinds of feedback welcome - be it about the product or why you feel we are not transparent. Absolute transparency has always been our goal, your feedback will definitely help us improve.
I was actually thinking of exactly this problem, but it turned out to be difficult to demonstrate, because YB doesn't allow you to add columns with default values to an existing table.
There might be other ways this could play out in migrations--I haven't had time to look deeply.
I wanted to add a few details to the previous reply.
While the Raft/HybridTime implementation has its roots in Apache Kudu the results will NOT be quite applicable to Kudu. Aside from the fact that the code base has evolved/diverged over the 3+ years, there are key/relevant areas (ones very relevant to these Jepsen tests) where YugaByte DB has added capabilities or follows a different design than Kudu. For example:
-- Leader Leases: YugaByte DB doesn't use Raft consensus for reads. Instead, we have implemented "leader leases" to ensure safety in allowing reads to be served from a tablet's Raft leader.
-- Distributed/Multi-Shard Transactions: YugaByte DB uses a home grown (https://docs.yugabyte.com/latest/architecture/transactions/t...) protocol based on two-phase commit across multiple Raft groups. Capabilities like secondary indexes, multi-row updates use multi-shard transactions.
-- Allowing online/dynamic Raft membership changes so that tablets can be moved (such as for load-balancing to new nodes).
FWIW, we implemented dynamic consensus membership change in Kudu way back in 2015 (https://github.com/apache/kudu/commit/535dae) but presumably that was after the fork. We still haven't implemented leader leases or distributed transactions in Kudu though due to prioritizing other features. It's very cool that you have implemented those consistency features.
Thanks for correcting me on the dynamic consensus membership change. Looks like the basic support was indeed there, but several important enhancements were needed (for correctness and usability).
- Remote bootstrap (due to membership changes) also has undergone substantial changes given that YugaByte DB uses a customize/extended version of RocksDB as the storage engine and does a tighter coupling of Raft with RocksDB storage engine. (https://github.com/YugaByte/yugabyte-db/blob/master/docs/ext...)
- Dynamic Leader Balancing is also new-- it causes leadership to be proactively altered in a running system to ensure each node is the leader for a similar number of tablets.
I'm curious if you did anything to prevent automatic rebalancing from being triggered at a "bad time" or have throttled it in some way, or whether moving large amounts of data between servers at arbitrary times was not a concern.
I am also curious if you added some type of API using the LEARNER role to support a CDC-type of listener interface using consensus.
We should really start some threads on the dev lists to periodically share this type of information and merge things back and forth to avoid duplicating work where possible. I know the systems are pretty different at the catalog and storage layers but there are still many similarities.
YugaByte DB's design is that a YB cluster supports Postgres in a native, self-contained & scale-out manner (much like YugaByte's Cassandra and Redis flavored offerings).
At a high-level, the upper half of the Postgres DB is being largely reused. The lower-half, i.e. the distributed table storage layer, uses YugaByte's underlying core-- a transactional and distributed document-based storage engine.
For the DB to be scalable, the lower-half being distributed is necessary but NOT sufficient. The upper-half also needs to be extended to be made aware of other nodes executing DDL/DML statements and dealing with related concurrency while still allowing for linear scale. Also, making the optimizer aware of the distributed nature of table storage is the other major piece of work in the upper-half.
These changes required in the upper half is what makes the "100% pure extension" model a bit harder... but that's something we intend to explore jointly with the Postgres community.
Please see this blog https://www.yugabyte.com/blog/chaos-testing-yugabytedb/ for latest updates, as well as information on additional in-house built frameworks for resiliency and consistency testing.