More

krenoten · on Feb 13, 2021

In practice this is the source of a tremendous number of bugs when async tasks don't actually clean up shared state in their Drop impls.

krenoten · on Feb 13, 2021

I haven't mentioned async Rust on HN since 2015.

krenoten · on Feb 21, 2020

The best solution is to use a 0-copy format like flatbuffers, google's zerocopy library, or serde's borrow attributes https://serde.rs/lifetimes.html#borrowing-data-in-a-derived-...

You're totally right that data is meaningless without the ability to use it. This is something I've been thinking about a lot, because it involves a lot of trade-offs that definitely were not apparent to me before I started building this.

Ultimately, I do want to make it easy to write a Tree<K, V> on top of sled, but if I were to do it myself it would significantly reduce performance and impose restrictions on users that don't feel appropriate to me.

* Being a lock-free B+ tree, there are a ton of cool techniques we can use on bytes that would not apply to arbitrary K types. For instance, all keys stored in a tree node are prefix encoded by the node's low key. This allows very long keys with common prefixes to be very cheaply stored, which is useful for things like F1-like embedded tables. We also do key truncation, where when we do a node split, we chop off the bytes necessary to actually differentiate between one side and the other, which further reduces the size of the index. Prefix encoding and suffix truncation are the bread and butter of modern B+ tree implementations, and that would hurt a lot to walk away from.

* All K and V types must then be serializable and deserializable. In rust we represent this with traits that come from specific external libraries. serde took up half of the sled compilation time when it was being used, so I don't want to pull that back in as a mandatory requirement for all users. I could have my own traits for this, and if I do anything like this, it will be the approach I take, and then external crates will implement serde-sled etc... but by depending on external crates, it makes for a very brittle API surface that requires all users to target the exact same version of that external trait. The core must be self-consistent and not export any external types that would cause sharp dependency issues.

* I like the idea of caching values that are deserialized only once, to avoid repeated deserialization costs on hot items. Something like this may be implemented in sled, but I haven't figured out the best way to represent it without causing memory issues etc... And in the mean time I'm more tempted to just point people to the 3 solutions above that let them view their bytes as structured data without many deserialization costs.

Ericson2314 · on Feb 22, 2020

In the spirit of my other comment https://news.ycombinator.com/item?id=22375979 it would be nice to say the layers broken out into separately usable abstractions. Ideally, this means layering, K and V type params, and no loss of performance.

> Prefix encoding and suffix truncation

At the very least, you can do all this stuff in the key is like [u8]/Vec<u8> case. But maybe also [T]/Vec<T>? I love to look at my existing monomorphic algorithms, and think "what traits would this require to make this as abstract as possible"? Almost never is the answer "sorry, it cannot be generalized at all".

> All K and V types must then be serializable and deserializable....and if I do anything like this, it will be the approach I take.....

I agree with your own traits. Per the above your algorithms come first, not other people's abstractions. But I don't know why you can't just do that and ignore serde entirely. Serde becomes someone else's problem, and if they don't want to bother with it you will provide the [u8]/Vec<u8> instances.

> And in the mean time I'm more tempted to just point people to the 3 solutions above that let them view their bytes as structured data without many deserialization costs.

Agreed. If you data structure has a pointer, that should be a foreign key in my book (though not necessarily in the same direction!). It's best if you can just memcopy the row, more or less.

[I say the pointer thing not only because marshaling cost, but also data modeling. Huge rows / json blogs just don't make sense to me. There almost surely is some structure to the nested data, and that deserves to be formalized and enforced in its own index. Ignore what they told you in SQL class. Indexes express/reify invariants, and foreign keys act through indices.]

krenoten · on Feb 21, 2020

only narcs use windows for database workloads :P

krenoten · on Feb 21, 2020

yeah, le-be conversions are not generally measurable above noise compared to other database-related work.

sled stores arbitrary bytes. endianness is the concern of the person who wants to store higher-level types than bytes, like integers. I do imagine having a story for letting people deserialize/view bytes once, and having that view sit in cache so that hot items are not repeatedly deserialized.

I agree with Adya's work " Fast key-value stores: An idea whose time has come and gone " https://research.google/pubs/pub48030/ where it makes the case that stateful systems should not have to pay repeated deserialization and network costs. I want sled to be well-situated for the world that we're headed into, where this view will become more prominent. This means caching deserialized data, better replication stories, and maybe nice helpers for distributed database authors that allow for atomic tree splits / merges and per-tree replication settings.

krenoten · on Feb 21, 2020

On sled 0.31 the memory growth does not run away as with the earlier version. Note that some in-memory metadata tracking does accrue as nodes split and sled tracks where to find them in the future when the actual data is paged in. But this is still a fraction of data stored.

The important thing is that sled is not stable yet. Don't bet your business on databases less than 5 years old. Sled doesn't turn 5 for a few more months, and I'm ironing out a few issues still that relate to production readiness. For now, treat it as a cache, as a responsible SRE would treat any database younger than 5 years old.

krenoten · on Oct 2, 2019

Just use a hashmap. Hashmaps in some cases can be like 8 orders of magnitude faster than Redis.

phamilton · on Oct 3, 2019

Hashmaps are faster, except when you need the data to be available to multiple machines.

A good description of redis is a "shared heap". HashSets, HashMaps, Heaps, Priority Queues, etc. are all great and fast in an application, but once you need to start sharing that data things get complicated quickly. So you designate a single server to implement those data structures and expose them to your application. And what you end up with is basically redis.

krenoten · on April 24, 2019

I make a lot of money fixing problems in storage layers that these kinds of ideas create. Thanks :]

krenoten · on May 28, 2018

There is zero reason behind this assumption.

krenoten · on May 28, 2018

It means pay more attention to reliability than pop infrastructure and internet companies (who can offset poor reliability with human attention or intentionally deprioritize it to sell more support contacts) tend to put into these things. Specifically, exhaustive concurrency testing of lock-free algorithm interleavings via ptrace driven scheduling, model-based testing in combination with fault injection, ALICE-style file correctness testing, and for the various distributed modules that sit on top, network simulation combined with lineage driven fault injection. This is all very much a work in progress, and I'd love to work with more people on it!