How does Scuba differ from Presto which is also developed by Facebook? It seems ...

lstyls · on Jan 23, 2017

Scuba made decisive tradeoffs in the functionality that it provides. Notable ones include that it doesn't support joins within a table, and doesn't provide any cross-table operations. Mostly it is used for basic filtering on constant values, and gathering summary statistics on those values. This is less of a limitation than it sounds like because when you know this ahead of time you just log your writes in a denormalized way and you don't need to join anything later.

As @ot said, Presto is just a query engine and it doesn't provide a backend. It provides an API that allows it to be plug in to different data warehousing systems. I would assume functionality depends to some extent on what your data is stored in, but in general Presto supports the full suite of standard relational db style queries.

Source: I work at FB as well. In fact I was using Scuba just now to do a quick analysis of our storage requirements for Scuba itself :)

nvais · on Jan 23, 2017

Here is a great post from Bobby Johnson (ex-Facebooker and CTO of Interana) with his opinion on "in-memory" data stores: https://community.interania.com/t5/Blogs/The-Myth-of-In-Memo....

This was the reasoning behind a very key architectural decision at Interana that makes it different than Scuba - instead of developing an in memory system, Interana created a custom data store that is heavily optimized around using spinning disk and CPU cache. This makes it incredibly fast and less expensive to operative massive clusters at scale.

ot · on Jan 23, 2017

Scuba is a complete system of log collection, storage and retrieval, and UI/visualization.

Presto would only cover the storage/retrival part. Scuba has its own backend for that which is very optimized for the kind of queries the UI needs to support, while Presto is a generic SQL store for analytics.

buremba · on Jan 23, 2017

How does Scuba optimize the stored data-sets for aggregation queries compared to Presto (Raptor connector)? They both use common columnar data storage techniques such as compression, delta encoding and dictionary encoding. The main difference seems to be the real-time nature of Scuba and the UI.

ot · on Jan 23, 2017

Oh yeah good point, I had forgotten that Presto does not support realtime. About optimizations, I don't know the details, but for one, Scuba is C++ and Presto is Java.