Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The live site trendingtopics.org is using MySQL for all 3 million articles and it handles it pretty well with the right indexing, bulk loads, and memcached. I built the initial demo in 10 days, so I choose Rails w/ MySQL mostly for simplicity and with the intention of adding Solr or Sphinx search. The way the data is stored (key value style w/ JSON timelines) was actually intended to lend itself to replacing MySQL with another fast big-table like datastore.


Thanks for the quick reply. How many machines are running MySQL for you?

I was reading this website - http://www.metabrew.com/article/anti-rdbms-a-list-of-distrib...

I have not tried HBase and HyperTable myself yet, but the blog post says that they still have latency issues. What are your views?


We're just using a single c1.medium instance for the database right now. Trendingtopics.org is a relatively low traffic, read-only site and most of the reads are for a handful of urls on the front page which can be cached.

Also, after processing the raw log data with Hadoop, we only need to store/lookup 3M records in the MySQL presentation layer, which is well within the capabilities of a tuned RDBMS. Many Rails sites are backed by MySQL, so I thought linking Hadoop/Hive to a common data workflow would make for a good example.

I've been hearing that recent improvements to HBase 0.20 could make it a contender: http://stackoverflow.com/questions/1022150/is-hbase-stable-a... and some high volume sites like Mahalo are already using it. That said, there are other alternative data stores (Cassandra, Voldemort, Tokyo Tyrant) that might be worth exploring if a database isn't cutting it for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: