Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep, it currently limits you to a 10G database size. Makes sense, though - SQL techniques like LIKE filters or joins really do require all of your data to be in memory with a very high-bandwidth, low latency interconnect - ie, it all has to be in memory on a single box.

This is the reason the datastore forbids these operations - because they're extremely difficult to efficiently implement and still scale indefinitely (without making other, potentially very large sacrifices).



What? A join requires all of your data to be in memory? I've sure as hell seen databases that executed queries with joins and the data was not entirely in memory...


Well, sure, it doesn't _require_ it. But the alternative is higher query latency. You have to collect a filtered set of the join column from table A, then ship them over to the servers responsible for table B. If your interconnect has high latency or low bandwidth, this can be painful - particularly if the intermediate set of keys is very large.

Hence why the datastore simply disallows this - yes, it's _possible_ to make joins work on larger datasets. But it's not possible to make _arbitrary_ joins work well on larger datasets.


Joins on indexed columns don't require anything apart from the index to be in memory in order to do the filtering.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: