Hacker News new | past | comments | ask | show | jobs | submit login

I feel like Bill Murray in Punxsutawney. Didn't we just see this article yesterday?

https://news.ycombinator.com/item?id=19492562

Here was my comment from yesterday morning:

>Question 1: What problems am I trying to solve?

I wish I had really thought about this when I first wrote my largest web scraper. At the time, I was still relatively new to database design and programming in general. This web scraper, out of the thousands I've written in the interim, is--of course--the one that is still going strong many years later.

I eshewed MongoDB for all the reasons given to me on the internet and, because I was slowly gaining competence with SQL, ended up building a large and complex pipeline to send the data right into Postgres. In retrospect, this was a serious design mistake, and one that I regret the most.

Although I still contend that the data did eventually need to be normalized, I now believe that I was doing it far too early. By ingesting the JSON stream into a parser, splitting it up, generating foreign keys, and then forcing the whole works into a single Postgres database I severely limited the capacity of my web scraper (and also guaranteed the need for a very powerful server to run it).

Had I initially dumped all results into MongoDB (or some other efficient document store) and then, separately, parsed the output into normalized SQL, I would have dramatically simplified the operation, maintenance, and debugging of my web scraper. Plus it would have been much simpler to spawn work jobs on to different machines instead of trying to break up huge monolithic processes with poorly defined endpoints. There have been many lessons learned.

In short, Mongo likely serves a very good purpose for high-speed data storage and manipulation (although it's hardly alone in this space). However, it's still likely not a great all-around solution and works best when supported by an ACID-based normalized RDBMS. Unless, of course, things have dramatically changed in recent updates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: