If you are building a database engine that strongly prioritizes performance, and Scylla does position itself that way, then C++ is the only practical choice today for many people, depending on the details. It isn't that C++ is great, though modern versions are pretty nice, but that it wins by default.
Garbage collected languages like Golang and high-performance database kernels are incompatible because the GC interferes with core design elements of high-performance database kernels. In addition to a significant loss of performance, it introduces operational edge cases you don't have to deal with in non-GC languages.
Rust has an issue unique to Rust in the specific case of high-performance database kernels. The internals of high-performance databases are full of structures, behaviors, and safety semantics that Rust's safety checking infrastructure is not designed to reason about. Consequently, to use Rust in a way that produces equivalent performance requires marking most of the address space as "unsafe". And while you could do this, Rust is currently less expressive than modern C++ for this type of code anyway, so it isn't ergonomic either.
C++ is just exceptionally ergonomic for writing high-performance database kernels compared to the alternatives at the moment.
> Rust has an issue unique to Rust in the specific case of high-performance database kernels. The internals of high-performance databases are full of structures, behaviors, and safety semantics that Rust's safety checking infrastructure is not designed to reason about. Consequently, to use Rust in a way that produces equivalent performance requires marking most of the address space as "unsafe". And while you could do this, Rust is currently less expressive than modern C++ for this type of code anyway, so it isn't ergonomic either.
None of that sounds right to me.
More likely the developers already know C++, there's already a lot of KV stores built in C++, and Rust is a relatively new player. Scylla was released in 2015, Rust hit 1.0 in 2015, seems obvious why Scylla didn't go with Rust.
edit: Yep, from further down
> So if we were starting at this point in time, I would take a hard look at Rust, and I imagine that we would pick it instead of C++. Of course, when we started Rust didn’t have the maturity that it has now, but it has progressed a long time since then and I’m following it with great interest. I think it’s a well-done language.
> Consequently, to use Rust in a way that produces equivalent performance requires marking most of the address space as "unsafe". And while you could do this, Rust is currently less expressive than modern C++ for this type of code anyway, so it isn't ergonomic either.
Based on my (admittedly limited) experience with Rust, this isn't true. Yes, you'd likely have to use "unsafe" a few times in order to implement a database system in Rust, but you would only need to do this for certain types of low-level data structures. The uses of those data structures—which would represent the majority of your code—would almost certainly be written in safe Rust. Don't throw the baby out with the bathwater.
I also contest the assertion that Rust is "less expressive" than C++; I have found Rust to be very expressive and concise for such a safe language. But I also don't have a ton of experience with either one, so don't take my word for that.
The real answer as to why Scylla does not use Rust is that the language simply wasn't very mature when they started. It also helps that there are significantly more engineers that know C++ than those that know Rust.
I am a very avid proponent of rust. however, here are a few places I have had difficulty in working on custom storage engines in rust:
- uninitialized memory: it is tricky to get the semantics of uninitialized memory right. the ergonomics of the `MaybeUninit` api are frankly terrible.
- memory alignment: for O_DIRECT and other cases where memory alignment is important, it is difficult to ensure that the backing memory of Vec and other datatypes is correctly aligned, which ends up pushing you towards raw pointers.
- mmap: after considerable research, it is unclear to me whether there is a safe rust api to mmap.
- hostility to unsafe: in general, rust is easy to learn (relative to C++). however, the hostility in the community to unsafe (there are some good reasons for this, not criticizing it in general), makes it more difficult for someone without a background in C/C++ to learn how to use unsafe correctly. feels like if you ask a question about how to do unsafe you get 100 people telling you what a terrible idea that is, but for database code there is very significant performance at stake.
> - uninitialized memory: it is tricky to get the semantics of uninitialized memory right. the ergonomics of the `MaybeUninit` api are frankly terrible.
Agreed. There's some unstable APIs that will help, but it's not great today.
> mmap
There is no possible way to expose raw mmap safely because the data under the hood can change out from under you. Whatever it is you're doing you'd want to wrap that. For example, a &[u8] could be safe, but not if you then did `str::from_utf8`. So you just have to make sure that mmap'd data is treated very carefully and doesn't get exposed across a safe boundary.
> - hostility to unsafe:
Same feeling here and I know many others feel the same way. The community can overreact to things, it is what it is.
In some databases, you neither have transparent virtual memory (like mmap or swap) nor can your runtime objects be guaranteed to exist in physical memory. In these models, references to your runtime objects are not pointers because a series of DMA operations into your address space may relocate them and your reference may also be on disk somewhere. DMA doesn't understand memory layouts or object models and has its own alignment rules, so when DMA writes to your address space, it is overwriting several potentially addressable and unrelated objects. Some databases don't even have locks to pin an object in place or arbitrate an access conflict; a scheduler decides when it is safe to dereference a particular pseudo-reference and resolves it to a transiently valid memory address. To make it a bit more complicated from the compiler's perspective, the handful of normal object pointers you do have are mapping all sorts of objects over the same memory as your other objects with different semantics, which looks like an aliasing violation at a minimum. The result is actually pretty elegant but implementation abandons any notion that an object exists at a unique memory address with a particular lifetime and knowable references. Nonetheless, it is essentially zero-copy, lock-free, and non-blocking, which is a major obsession among the performance people.
This architecture even makes C++ compilers a bit squeamish, so it is understandable why Rust looks at these things with abject horror. If you are leaning heavily on the OS facilities to do all those things for you automagically, which many open source databases do, then Rust works fine with only modest amounts of "unsafe" code. It just produces a database that is much slower.
As for the expressiveness, Rust is adding more metaprogramming facilities but it isn't there yet. C++ template metaprogramming is incredibly powerful for writing concise, correct database internals. I used to write databases in C99; it required like 5x the code to do the same thing and without the extensive compile-time correctness verification and type-safe code generation.
I always love your take even if I don't agree, SpaceCurve was a phenomenal system, one of the most pragmatic, high performance, easy to use MPP database systems I have ever used. We never met btw, was just a user.
But I think you are wrong about Rust not having the right machinery for making high performance dbs. Two examples are Noria and Materialize
This kind of reinforces my point though: neither Materialize nor Noria are high-performance database kernels, and they don't need to implement the high-performance I/O structures database kernels have that give Rust problems. Rust works great for server software generally, database kernels are a very specific outlier.
It is common in recent database kernel architectures to implement an entire virtual memory system in user space. This enables some great throughput optimizations. Almost all of your runtime objects are instantiated on top of this and, importantly, entities outside your process/code can write into your address space -- an invisible implicit reference. As a side effect, there are few memory references in the way Rust understands it, those outside entities don't understand or respect the object model, and some aspects of ownership, mutability, and lifetime can only be resolved at runtime and with some interesting edge cases. The model is elegant and safe, it just doesn't provide a coherent graph of classic memory references that Rust can latch onto at compile-time for safety analysis.
Not sure proves your point, but maybe doesn't disprove your point strongly enough. I am not qualified to argue from experience about how Rust is ideally suited in the ways you think it is not. But from everything I have seen, it can do a whole lot of what C++ is also good at. Rust safety is not all or nothing and a codebase could definitely prioritize ergonomics over correctness.
Two things that I saw in the last couple weeks that might start to sway you.
Garbage collected languages like Golang and high-performance database kernels are incompatible because the GC interferes with core design elements of high-performance database kernels. In addition to a significant loss of performance, it introduces operational edge cases you don't have to deal with in non-GC languages.
Rust has an issue unique to Rust in the specific case of high-performance database kernels. The internals of high-performance databases are full of structures, behaviors, and safety semantics that Rust's safety checking infrastructure is not designed to reason about. Consequently, to use Rust in a way that produces equivalent performance requires marking most of the address space as "unsafe". And while you could do this, Rust is currently less expressive than modern C++ for this type of code anyway, so it isn't ergonomic either.
C++ is just exceptionally ergonomic for writing high-performance database kernels compared to the alternatives at the moment.