We had a sales call with them last year and got to speak to one of their devs. The impression they gave was that it was mostly ex-Facebook data guys that left to start a company based on the work they did on Cassandra and a few other internal projects.
The really interesting feature, to us, was the promise that once the Postgres-compatible layer was complete, we could use whatever semantics were appropriate for our business use case while using the same logical database cluster. We could use the Redis interface for persistent caching, the CQL interface for our NoSQL-appropriate use cases and the Postgres interface for our more traditional use cases. And the client libraries for all those interfaces are the same ones we already use to talk to Redis and Postgres (our conversation happened because we were starting a project that was more NoSQL-appropriate, so we weren’t using Cassandra yet), so very little of our code would have to change.
They're a good team. The same data isn't available across different interfaces but there is definitely value in having a single core system that serves multiple apps and models.
The question is, how many of these very similar databases can the market support?
The field is getting crowded, and the database market is already quite competitive as it is, without these new competitors.
There just are not that many use cases where a larger Postgres/MySQL instance with one or two replicas is insufficient.
From a user perspective, I'd much rather have one or two successful companies where I can be reasonably certain that the product will be maintained in 5 years than too much competition.
I do think there are going to be more and more use cases where having more than one or two replicas is necessary, even if there aren't that many right now. IoT strikes me as an example. But even so, I'm with you. I'd much rather have two companies that I know will maintain the software long term than a plethora of competitors that might die out at any time. And while being open source is helpful in that regard, there's no guarantee that the open source community will maintain it after the backing company is gone. Databases are too important to gamble on imo.
On the other hand, a little competition to make sure the big guys stay on their toes is a good thing.
Even if the commercial/AGPL/GPL ones "win" for a few years, they won't be able to compete with these more Open licensed DBs when they catch up.
So at any given moment, too many DBs may be annoying, but for the long term and long game it is important there is this type of competition & research going on.
Although, I absolutely agree, when it comes to Master-Slave based systems (I've been very vocal in criticizing them) that market is drying up to some very limited use cases (banking, etc.). 99%+ of use cases will be Strong Eventual Consistency and CRDT with distributed or decentralized/P2P tools.
Some really old, yet very relevant, thoughts on this subject:
I'm thourghly confused by your association of (a)GPL and commercial and calling apach/bad licenses more open. AGPL ensures that users always retain the 4 freedoms, by restricting developers. BSD allows developers to do whatever, including restricting the users. Neither is "more open", they both make trade offs and neither is comparable to proprietary except to say the BSD style licences allow for it if the developer chooses.
Look at the kurfuffle around mongo, redis, and elastic search because of their licenses. However, you don't hear the same issues coming from the postgres community. The licenses you're claiming will win the day cause problems for for-profit I companies, for exactly the reason you think they're "more open".
In the end, either entrenched proprietary software or open, community-focused, community-stewarded software will win the day.
I believe we both have reasonable arguments from our paradigm, it is just the paradigms have conflicting definitions.
When people who share camp with me say "Open" or "Freedom" we mean Free Speech AND Free Beer.
Where the disagreement happens is on Free Speech:
There are many people/governments that define Free Speech as "Free Speech as long as someone does not shout 'fire' in a crowded room." This is the spirit of (a)GPL in restricting people.
The other group defines Free Speech and/or "Freedom" as "without restriction". Not because they want people to yell "fire" but because they attribute restriction/regulation as the mechanism towards monopoly & centralization. Not that regulation/restriction on its own is bad (every individual ought exercise self-discipline), but it is particularly dangerous once monopoly & centralization emerges because it produces totalitarian or fascist structures.
To counter my own view, many people in the camp opposite of me, have expressed same end-goal concerns "we want to restrict hate speech so fascism doesn't rise". I think it is admirable we have shared-goals (stopping totalitarianism), but for reasons you probably don't share, I think it is more effective to stop fascism by removing the ability for fascists to enforce rules/regulation/restrictions on individuals, even if that comes at the cost or risk of someone yelling "fire".
Why? (I don't assume anyone cares about my view, so don't feel obligated to read) Because I have higher optimism that humans will eventually overcome their individual immaturity (shouting "pen--" in a crowd), especially through incentive design, than in humans overcoming their tendency towards abuse of power (or even worse, most people who "abuse" power don't think they are abusing it, they have a conviction that the use of power is for some greater good). Wielding power is often the end game of any incentive structure, but yelling "fire" or "p--is" often ruins your reputation/power so naturally is disincentivized over time (or where it matters most).
I feel like your "fire" and "totalitarian" examples are confusing, entirely off-base and non-illustrative of anything useful to this conversation.
Why? Because the difference between copyleft and non-copyleft licenses isn't akin to censorship vs no-censorship. The argument for the copyleft is more akin to the arguments for laws in general: someone's absolute freedoms needs to be troddened on to have a free society.
I similarly fail to see how a copyleft is a power to abuse. Surely the ability to close the source of an application has more power that can be abused?
Your 2nd paragraph says pretty much what I was trying to say (except for difference in law views) that your 1st paragraph says is off-base.
Another way for me to say it is, that of course you would think my thoughts are off-base since I come from a different foundational base as you. I was just trying to explain the difference itself, not saying that you need to change views (your view is logical from your "base").
You think people's freedoms need to be trodden upon for a free society.
I don't. That scares me and many others.
Edit: I did not downvote you, just FYI, I don't know who/why would.
> You think people's freedoms need to be trodden upon for a free society.
Do you take this stance with laws against murder and theft? Society has laws and rules. People as a whole, as all available examples show, do not optimize for the greater good by default and without any rules or norms.
There are good talking points to the copyleft debate, but that copyleft imposes rules and non-copleft doesn't is false and doesn't move this debate forward in any meaningful way.
Since it doesn't support serializable transactions I'm not sure why FoundationDB would be mentioned as a comparison in the write up. The operations it does support seem to set the bar pretty low as to what to test.
edit: good reply by the founder of YugaByte but for some reason the comment is dead. I have noticed that when founders don't have an account on here and then something comes up where they need to reply their comments are often deaded.
We have run a significant number of useful/practical tests via Jepsen, that only need snapshot isolation level. The tests that were run included a single-key counter test, set tests with and without a secondary index, a "long fork" test ensuring that the order of operations is the same when observed by different clients, and a bank test verifying that the total balance of multiple accounts stays the same when cross-shard transactions transfer funds between pairs of accounts. These tests were run under a variety of failure modes, including different types of network partitions and clock skew. Also, snapshot isolation covers a very large spectrum of practical uses cases, including secondary indexes, for building real-world applications.
Having said that, we have recently added support for serializable isolation level to YugaByte DB, and we will be adding tests to the Jepsen suite for that in near term.
We use cockroachDB in production and before that we were on MySQL and as of yet we don’t have a specific usecase where we use serializable transactions. Snapshot isolation or even read committed is just fine. So I don’t think it’s absolutely necessary
To be clear there’s no way around serializable transactions in cockroachDB. We have had to adapt our monolith to it (we’re thinking of ways to make it more nimble by breaking out services etc). But the point I was making was that we had MySQL for a while and never ran into issues with its isolation levels until it stopped scaling. Instead of vitess or some other MySQL system we went with cockroach after finding vitess didn’t fit us — too complicated and too many moving parts. CockroachDB just works. Also moving to k8s adds complexity too for a monolith built and run on VMs. But so far so good. Cockroach runs fast and is performant given production queries. And ops is happy because it self heals.
I wanted to add a few details to the previous reply.
While the Raft/HybridTime implementation has its roots in Apache Kudu the results will NOT be quite applicable to Kudu. Aside from the fact that the code base has evolved/diverged over the 3+ years, there are key/relevant areas (ones very relevant to these Jepsen tests) where YugaByte DB has added capabilities or follows a different design than Kudu. For example:
-- Leader Leases: YugaByte DB doesn't use Raft consensus for reads. Instead, we have implemented "leader leases" to ensure safety in allowing reads to be served from a tablet's Raft leader.
-- Distributed/Multi-Shard Transactions: YugaByte DB uses a home grown (https://docs.yugabyte.com/latest/architecture/transactions/t...) protocol based on two-phase commit across multiple Raft groups. Capabilities like secondary indexes, multi-row updates use multi-shard transactions.
-- Allowing online/dynamic Raft membership changes so that tablets can be moved (such as for load-balancing to new nodes).
FWIW, we implemented dynamic consensus membership change in Kudu way back in 2015 (https://github.com/apache/kudu/commit/535dae) but presumably that was after the fork. We still haven't implemented leader leases or distributed transactions in Kudu though due to prioritizing other features. It's very cool that you have implemented those consistency features.
Thanks for correcting me on the dynamic consensus membership change. Looks like the basic support was indeed there, but several important enhancements were needed (for correctness and usability).
- Remote bootstrap (due to membership changes) also has undergone substantial changes given that YugaByte DB uses a customize/extended version of RocksDB as the storage engine and does a tighter coupling of Raft with RocksDB storage engine. (https://github.com/YugaByte/yugabyte-db/blob/master/docs/ext...)
- Dynamic Leader Balancing is also new-- it causes leadership to be proactively altered in a running system to ensure each node is the leader for a similar number of tablets.
I'm curious if you did anything to prevent automatic rebalancing from being triggered at a "bad time" or have throttled it in some way, or whether moving large amounts of data between servers at arbitrary times was not a concern.
I am also curious if you added some type of API using the LEARNER role to support a CDC-type of listener interface using consensus.
We should really start some threads on the dev lists to periodically share this type of information and merge things back and forth to avoid duplicating work where possible. I know the systems are pretty different at the catalog and storage layers but there are still many similarities.
Yes, it does. At the core, the raft implementation is still based on kudu's. But, these areas have been worked on actively so the implementations might has diverged a little.
May be worth looking through the individual issues to see what applies and what doesn't:
Not a comment on YugeByte, but... I love it when a new Jepsen report get released. Kyle Kingsbury has single-handedly raised the bar on an entire industry. (Well, not single-handedly anymore, but still.)
Couldn't agree more. There are 3 sources of information regarding database serializability/linearizability:
1. Marketing material (mostly useless)
2. Individual projects/post-mortems (50/50 here; some just mis-use the technology from the get-go, others have valid feedback, but it's tough to determine when either applies)
3. Jepsen Tests (which is more like independently verifiable science)
Sure, you can decide that your social-media solution has no need for consistency (or even durability!) - but in my experience, most solutions don't have that flexibility.
I think the YB team members are probably best equipped to talk about this, but I can note that while some databases do build their own clock synchronization protocol, many prefer to let the OS handle clocks. For one thing, clock sync is surprisingly tricky to do well, so it makes sense to write daemons that do it well once and be able to re-use them in lots of contexts. There's also the question of HW support: in theory, datacenter and hardware providers could do better than pure-software time synchronization by, say, offering dedicated physical links to a local atomic + GPS clock ensemble. AWS TimeSync is a step in this direction, and I wouldn't be surprised if we see more accurate clocks in the future.
There are still tons of caveats with this idea--Linux and most database software ain't realtime, for starters--but you can imagine a world in which clock errors are sufficiently bounded and infrequent that they no longer represent the most urgent threat to safety. That's ultimately a quantitative risk assessment.
My suspicion is that DB vendors like YugaByte and CockroachDB are making a strategic bet that although clocks right now are pretty terrible, they won't be that way forever. I'd like to see more rigorous measurement on this front, because while I've got plenty of anecdotes, I don't think we have a broad statistical picture of how bad typical clocks are, and whether they're improving.
As @aphyr had mentioned, any NTP-alike system would work. We can update the docs to mention PTP, we do work with AWS Time Sync as well (which uses Chrony).
In short, no: many transactional databases don't rely on clocks for safety. I'm going to speak in broad terms here--there's a lot of nuance and special cases that we can dig into, but I'd like to keep this accessible:
You can use CRDTs, and other commutative data structures, to obtain totally-available replicated objects across wide area networks. Systems like Riak do this. CRDTs can't express some types of computation safely, though! For instance, you can't do something like a minimum-balance constraint, ensuring that an account always contains $25 or more, if you allow both deposits and withdrawals, in a commutative system. Why? Because order matters! Deposit, withdraw is different than withdraw, deposit, in terms of their intermediate states.
For order, you can use a consensus mechanism, like ZAB (Zookeeper), Paxos (Riak SC, Cassandra LWT), or Raft (etcd, consul) to replicate arbitrary state machines without any clock dependence at all. These systems require at least one round trip to establish consensus, and their guarantees only apply within the consensus system itself.
What if you have multiple consensus groups? Say, one per shard? Then you need a protocol to coordinate transactions on top of that. You can execute an atomic commit protocol for cross-shard transactions, perhaps using a consensus system. Or you can use a protocol like Calvin to obtain serializability (or stronger) across shards without relying on clocks. That's what FaunaDB does. That adds a round-trip, but if you're clever, you may only have to pay that round-trip cost between different datacenters once.
Another tactic is to exploit well-synchronized clocks to obtain consistent views across independent consensus groups. You can use this technique to (theoretically) reduce the number of round trips a transaction costs, and there are different ways to balance whether you pay increased latency on read or write transactions. Spanner, CockroachDB, and YugaByte DB all take this approach, with different tradeoffs.
Spanner is backed by custom hardware and carefully designed software, to obtain tight bounds on clock error. CockroachDB and YugaByte DB leave that problem to you, the operator.
Often, a database uses a stronger replication mechanism inside a datacenter, but when it comes to replicating between datacenters, backs off to a weaker strategy which doesn't offer the same safety invariants.
While FoundationDB uses Paxos for cluster state (like leader election), it is not on the commit path for a transaction. If any process fails in the transaction system (not storage processes), the cluster is reconfigured by the coordinators and every component is replaced. Transactions do not proceed during failures, but the cluster will replace the failed process in a few seconds and resume.
(This is not meant to be a contradiction, just pointing out an important difference compared to systems that allow progress in parallel with failures.)
Jepsen is not a performance test; we verify safety. I haven't looked at ScyllaDB personally, but you can read about Scylla's own work testing their database here [1], and see some of the issues they found here [2].
YugaByte product manager here. The YCQL API which passes Jepsen has its roots in Cassandra Query Language but does not use Cassandra as its backend store. It’s backend store is DocDB, which is a Google Spanner-inspired distributed document store.
Seems to be another distributed SQL (aka 'newsql') alternative to TiDB and CockroachDB.
Based on RocksDB (like Cockroach) with a custom distributed key/val layer and and additional SQL layer on top. PostgreSQL protocol compatible.
OS with Apache license.
Seems interesting. (when ignoring the "planet scale SQL" marketing speak... [1])
[1] https://www.yugabyte.com/planet-scale-sql/