I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command
bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>
It throws the following errors (not at the same time.) The first error is thrown when i don t replace the slashes with %2F and the second is thrown when i replace them with %2F :
1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for / XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.
Note:
1)when i submitted jps to see what tasks were running on the Master, it just showed
1116 NameNode
1699 Jps
1180 JobTracker
leaving DataNode and TaskTracker.
2)My Secret key contains two / (forward slashes). And i replace them with %2F in the S3 URI.
PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)
IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.
Regards,
Deepak.