I'm thrilled that AMPLab and CSAIL are building this.
For the vast majority of analytics problems and projects I've worked on, approximate numbers are just as good as exact results. One of the biggest productivity blockers can be queries and analytics that take hours instead days to run, instead of seconds to minutes, as these dramatically decrease the number of iterations you can execute and ideas you can test.
We commonly work on sub-sampled versions of datasets to enable interactive queries and analytics - it's really great to see someone formalizing this process and handling the details in a simple and principled manner.
You might benefit from a different name though. "[word]DB" is starting to become a pattern in people's internal spam filters. And it reminds me of CouchDB, MongoDB, RethinkDB, etc.
I expect it will be especially helpful for businesses analyzing data-- they can get useful results from massive datasets without massive hardware expenses.
I've often dealt with "big data" by using sampling and stratified sampling and it is nice to see they're building something that can automate this process.
For the vast majority of analytics problems and projects I've worked on, approximate numbers are just as good as exact results. One of the biggest productivity blockers can be queries and analytics that take hours instead days to run, instead of seconds to minutes, as these dramatically decrease the number of iterations you can execute and ideas you can test.
We commonly work on sub-sampled versions of datasets to enable interactive queries and analytics - it's really great to see someone formalizing this process and handling the details in a simple and principled manner.