English 中文(简体)
Hadoop or Hadoop Streaming for MapReduce on AWS
原标题:

I m about to start a mapreduce project which will run on AWS and I am presented with a choice, to either use Java or C++.

I understand that writing the project in Java would make more functionality available to me, however C++ could pull it off too, through Hadoop Streaming.

Mind you, I have little background in either language. A similar project has been done in C++ and the code is available to me.

So my question: is this extra functionality available through AWS or is it only relevant if you have more control over the cloud? Is there anything else I should bear in mind in order to make a decision, like availability of plugins for hadoop that work better with one language or the other?

Thanks in advance

最佳回答

You have a few options for running Hadoop on AWS. The simplest is to run your MapReduce jobs via their Elastic MapReduce service: http://aws.amazon.com/elasticmapreduce. You could also run a Hadoop cluster on EC2, as described at http://archive.cloudera.com/docs/ec2.html.

If you suspect you ll need to write your own input/output formats, partitioners, and combiners, I d recommend using Java with the latter system. If your job is relatively simple and you don t plan to use your Hadoop cluster for any other purpose, I d recommend choosing the language with which you are most comfortable and using EMR.

Either way, good luck!

Disclosure: I am a founder of Cloudera.

Regards, Jeff

问题回答

I decided the flexibility of Java was more important than dealing with the possible shortcomings of adjusting my current code from C++ to Java.

Thanks for all your answers.

It depends on your needs. What is your input/output? Is it a simple text files? Records with new line delimiters? Do you need a special combiner? partitioner?

What i mean is, that if you need only the hadoop basics, than streaming will be fine. But if you need a little more complexity (from the hadoop framework, not from your own business logic), hadoop jar will be more flexible.

Sagie





相关问题
Mount windows shared drive to MWAA in bootscript

In MWAA startup script sudo yum install samba-client cifs-utils -y sudo mount.cifs //dev/test/drop /mnt/dev/test-o username=testuser,password= pwd ,domain=XX Executing above commonds giving error - ...

How to get Amazon Seller Central orders programmatically?

We have been manually been keying Amazon orders into our system and would like to automate it. However, I can t seem to figure out how to go about it. Their documentation is barely there. There is: ...

Using a CDN like Amazon S3 to control access to media

I want to use Amazon S3/CloudFront to store flash files. These files must be private as they will be accessed by members. This will be done by storing each file with a link to Amazon using a mysql ...

unable to connect to database on AWS

actually I have my website build with Joomla hosted on hostmonster but all Joomla website need a database support to run this database is on AWS configuration files need to be updated for that I ...

Using EC2 Load Balancing with Existing Wordpress Blog

I currently have a virtual dedicated server through Media Temple that I use to run several high traffic Wordpress blogs. Both tend to receive sudden StumbleUpon traffic surges that (I m assuming) ...

SSL slowness in EC2

We ve deployed our rails app to EC2. In our setup, we have two proxies on small instances behind round-robin DNS. These run nginx load balancers for a dynamically growing and shrinking farm of web ...

热门标签