Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

map/filter/reduce?

Give you the contents of your table, so you can process it in the app tier with map/filter/reduce?!

Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization. And almost always forgetting that serialization off of disk and over a network isn't free.

There's a reason why SQL is going on 50 years when most technologies are lucky to remain dominant past 10 in this industry. And if you think it's just because of inertia or lack of imagination, you're deluded. SQL isn't perfect (nothing is), but as a DSL for set theory, it does a damn good job, even 50 years later. Far better than any map/filter/reduce whipped up yet again by someone who doesn't fully understand the scope of the problems being solved.

It's doubly troubling when you can't grok that SELECT = map, WHERE = filter, and GROUP BY + aggregator = reduce. I sincerely hope you aren't avoiding JOIN by loading both tables ahead of time.




> Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization.

A "dedicated army of developers" who've been "going on 50 years" but whose flagship solutions are still single-point-of-failure, still noncompositional, still untestable, still have bizarre and incomprehensible performance characteristics. Yeah, no, I'm going to do stuff in regular application code, thanks.


You pay a huge cost in moving those bits over the network though. How do you even deal with tables that don't fit in memory or the lack of indexes for tables where sequential scanning is too slow?


I mean at that point you're getting into real big data stuff. Move the code to the data rather than moving the data to the code; have some way to stream in the data in its native format rather than having to read it in; do pre-aggregation and indexing as the data is written (but not in a way that's blocking your OLTP). Which, yes, is stuff that SQL RDBMSes do for you up to a point, but they do it in invisible and unmanageable ways; IME you're better off doing it explicitly and visibly.

In seriousness SQL databases can be good for ad-hoc exploratory queries, so having your data processing pipeline spit out a read-only SQL database dump on a regular schedule is worthwhile, but using that for anything more than prototyping is a mistake.


I am not particularly opinionated on the matter and i think i can appreciate both sides of the argument.

SQL is somewhat analogous to C. It is entirely legitimate to look at its archaic usability features and wonder if we could not do better and at the same time it has hit a sweet spot of adaptation to the domain that ensured its longevity.

Something that might shake up things a bit is graph databases / query languages. We can think of them as a generalization of the sql universe and an opportunity to modernize it


Yeah, if your devs are loading entire tables into code and using filter on them ... I haven't personally seen that, but oh god.

Not that it isn't valid in some cases. It is. If for example, you're using all of the table in your logic and need to filter on to do something like pull out certain items for a certain need. Sure. Then it is good, absolutely!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: