MySQL's pluggable engine really is a killer feature. From these results you woul...

felixhandte · on May 1, 2018

I actually think the correct conclusion is the reverse: if you can fit your tables in RAM, choose Inno. Otherwise, Rocks outperforms Inno when the working set no longer fits in memory. OP's testing is a little confusing because the working set remains constant and the memory is scaled, where normally we think of this in terms of seeing how much we can scale the workload on a node with fixed resources.

Rapzid · on May 1, 2018

Dat jitter tho.

Like most things, the devil is in the details.. And the use case.

mtanski · on May 1, 2018

That's somewhat inherent to LSM tree.

adventured · on May 1, 2018

That's the right conclusion. Most use cases will fit into memory and InnoDB will win vs Rocks when all of your data fits into memory. Very large users are the ones who would more typically benefit from Rocks, as their data sets may dramatically outstrip available memory.

PeterZaitsev · on May 1, 2018

There is another side to MyRocks/RocksDB Story - Compression

With a lot higher compression than possible with Innodb MyRocks can be in memory (File Cache) when Innodb workloads are already IO bound. Plus you can save a lot on disk storage (and IO if you're paying for it)

ggg9990 · on May 1, 2018

But larger users can also afford more RAM.

PeterZaitsev · on May 1, 2018

This is common misconception. The larger there is a cost of sub-optimal performance. If you spend $1000/month on your infrastructure, halving infrastructure cost will save you $500/month which can't justify a lot of investments. Now if it is $100M/month saving $50M a month is worth a lot.... this is why Facebook for example has created many custom built highly optimized systems like RocksDB

PeterZaitsev · on May 1, 2018

Note MyRocks really excels when data set is larger than memory. For mostly in-memory workloads many smaller users have Innodb is better.

bogomipz · on May 1, 2018

Can you explain? I thought Rocks was an LSM storage engine and mostly excelled with write heavy work loads.

PeterZaitsev · on May 1, 2018

The benefit of LSM for Write workloads is a lot higher with large data sets. Optimizing BTREE in memory is cheap, BTREE on disk requires a lot of IO

LSM however does not require as much disk IO for inserts even if data is much larger than memory

bogomipz · on May 1, 2018

I'm not following.

Myrocks uses a clustered key for the primary key from what I have read. Isn't this the same as innodb? Is it not a B-Tree?

Could you elaborate on what's special about indexes - primary or secondary in Myrocks?

PeterZaitsev · on May 1, 2018

MyRocks is LSM based. The LSM is different from BTREE in what you need to do multiple "physical" lookups and merge result to perform one logical lookup. There are some tricks like Bloom Filters are employed to optimize it but it is still not as efficient in memory as BTREE especially for range lookups. Here is some information on WIKIPEDIA https://en.wikipedia.org/wiki/Log-structured_merge-tree

bogomipz · on May 1, 2018

Right Cassandra uses bloom filters.

I think I may have misinterpreted your originalcomment.

Fractal Tree Indexes are interesting as well as they are optimized for hitting the disk:

https://www.percona.com/doc/percona-tokudb/ft-index.html

dajonker · on May 1, 2018

As in any situation when it comes to performance: measure your own real world performance before making any decisions.

Benchmarks are nice to show that somethings work better than others in theory or "perfect" examples, but real databases tend to be much more complicated and messy, and sometimes just writing smarter queries, creating an index, or changing application logic may have a much bigger impact.

nartz · on May 1, 2018

I sort of consider these different things. Key/value store versus typical row store.

PeterZaitsev · on May 1, 2018

What do you see as the difference ?

Under SQL engine there tend to be some lower level API