English 中文(简体)
How can I use Hive on top of Amazon Elastic Mapreduce to process data in Amazon Simple DB?
原标题:

I have a lot of data in an Amazon Simple DB Domain. I want to start Hive on Elastic Map Reduce (on top of hadoop) and somehow, either import data from simpledb or, connect to simpledb and run hiveql queries on it. I have having issues importing the data. Any pointers?

问题回答

As input to a streaming hadoop job you could have a sequence of select statements for simpleDB.

for example, your input could contain (in a less verbose form):

collectionA between dates 123 and 234
collectionA between dates 235 and 559
collectionA between dates 560 and 3000
...

Then you would implement a mapper script that performed the following transformation: input_select_statement => execute_select_statement => output_results

This would be super easy using streaming because you could use any library for any language you like and not have to worry about implementing any of the complicated Hadoop java stuff.

Hope this helps.

(the hacky way to do it would be to have a single script that you run locally that does the same as above, but loads the results into s3. I run a script like that nightly for a lot of our database data)





相关问题
Mount windows shared drive to MWAA in bootscript

In MWAA startup script sudo yum install samba-client cifs-utils -y sudo mount.cifs //dev/test/drop /mnt/dev/test-o username=testuser,password= pwd ,domain=XX Executing above commonds giving error - ...

How to get Amazon Seller Central orders programmatically?

We have been manually been keying Amazon orders into our system and would like to automate it. However, I can t seem to figure out how to go about it. Their documentation is barely there. There is: ...

Using a CDN like Amazon S3 to control access to media

I want to use Amazon S3/CloudFront to store flash files. These files must be private as they will be accessed by members. This will be done by storing each file with a link to Amazon using a mysql ...

unable to connect to database on AWS

actually I have my website build with Joomla hosted on hostmonster but all Joomla website need a database support to run this database is on AWS configuration files need to be updated for that I ...

Using EC2 Load Balancing with Existing Wordpress Blog

I currently have a virtual dedicated server through Media Temple that I use to run several high traffic Wordpress blogs. Both tend to receive sudden StumbleUpon traffic surges that (I m assuming) ...

SSL slowness in EC2

We ve deployed our rails app to EC2. In our setup, we have two proxies on small instances behind round-robin DNS. These run nginx load balancers for a dynamically growing and shrinking farm of web ...

热门标签