If want to process large amount of data in real-time ( twitter feed, click stream from website) etc using cluster of machines then check out "storm" which was opensource d from twitter recently
Standard Apache Hadoop is good for processing in batch with petabytes of data where latency is not a problem.
Brisk from DataStax as mentioned above is quite unique in that you can use MapReduce Parallel processing on live data.
There are other efforts like Hadoop Online which allows to process using pipeline.
Google BigQuery obviously another option where you have csv (delimited records) and you can slice and dice without any setting up. It s extremely simple to use ,but is a premium service where you have to pay by no. of bytes processed ( first 100GB / month is free though).