Scaling Big Time with Hadoop

Notice: This is a discussion thread for comments about the SitePoint article, Scaling Big Time with Hadoop.


We started a small scale R&D project a few months back using JAVA, PHP and MySQL. This week we just learned our first complete dataset will have around 30 Billion records…and we need to scale to ingest and return up to 16 datasets.

So scale is a huge deal at this point. We have been considering Cassandra, HBase, CouchDB, but this article makes me want to quickly test out Hadoop. Especially since we can re-use the SQL code we already have using Hive.

Great Article. Keep up the great work.

I’m interested in knowing how many rows the author was dealing with that took 29.449 to return the data set.