Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does Scuba differ from Presto which is also developed by Facebook? It seems that it stores data in-memory and have data expiration feature but also has many common features such as SQL and distributed processing.



Scuba made decisive tradeoffs in the functionality that it provides. Notable ones include that it doesn't support joins within a table, and doesn't provide any cross-table operations. Mostly it is used for basic filtering on constant values, and gathering summary statistics on those values. This is less of a limitation than it sounds like because when you know this ahead of time you just log your writes in a denormalized way and you don't need to join anything later.

As @ot said, Presto is just a query engine and it doesn't provide a backend. It provides an API that allows it to be plug in to different data warehousing systems. I would assume functionality depends to some extent on what your data is stored in, but in general Presto supports the full suite of standard relational db style queries.

Source: I work at FB as well. In fact I was using Scuba just now to do a quick analysis of our storage requirements for Scuba itself :)


Here is a great post from Bobby Johnson (ex-Facebooker and CTO of Interana) with his opinion on "in-memory" data stores: https://community.interania.com/t5/Blogs/The-Myth-of-In-Memo....

This was the reasoning behind a very key architectural decision at Interana that makes it different than Scuba - instead of developing an in memory system, Interana created a custom data store that is heavily optimized around using spinning disk and CPU cache. This makes it incredibly fast and less expensive to operative massive clusters at scale.


Scuba is a complete system of log collection, storage and retrieval, and UI/visualization.

Presto would only cover the storage/retrival part. Scuba has its own backend for that which is very optimized for the kind of queries the UI needs to support, while Presto is a generic SQL store for analytics.


How does Scuba optimize the stored data-sets for aggregation queries compared to Presto (Raptor connector)? They both use common columnar data storage techniques such as compression, delta encoding and dictionary encoding. The main difference seems to be the real-time nature of Scuba and the UI.


Oh yeah good point, I had forgotten that Presto does not support realtime. About optimizations, I don't know the details, but for one, Scuba is C++ and Presto is Java.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: