English 中文(简体)
What default reducers are available in Elastic MapReduce?
原标题:

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows.

In Amazon s "Introduction to Amazon Elastic MapReduce" PDF it states "Amazon Elastic MapReduce has a default reducer called aggregrate"

What I would like to know is: are there other default reducers availiable?

I understand that I can write my own reducer, but I don t want to end up writing something that already exists and "reinvent the wheel" because I m sure my wheel won t be as good as the original.

最佳回答

I m in a similar situation. I infer from Google results etc that the answer right now is "No, there are no other default reducers in Hadoop", which kind of sucks, because it would be obviously useful to have default reducers like, say, "average" or "median" so you don t have to write your own.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html shows a number of useful aggregator uses but I cannot find documentation for how to access other functionality than the very basic key/value sum described in the documentation and in Erik Forsberg s answer. Perhaps this functionality is only exposed in the Java API, which I don t want to use.

Incidentally, I m afraid Erik Forsberg s answer is not a good answer to this particular question. Another question for which it could be a useful answer can be constructed, but it is not what the OP is asking.

问题回答

The reducer they refer to is documented here:

http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html

That s a reducer that is built into the streaming utility. It provides a simple way of doing common calculation by writing a mapper that output keys that are formatted in a special way.

For example, if your mapper outputs:

LongValueSum:id1	12
LongValueSum:id1	13
LongValueSum:id2	1
UniqValueCount:id3	val1
UniqValueCount:id3	val2

The reducer will calculate the sum of each LongValueSum, and count the distinct values for UniqValueCount. The reducer output will therefore be:

id1	25
id2	12
id3	2

The reducers and combiners in this package are very fast compared to running streaming combiners and reducers, so using the aggregate package is both convenient and fast.





相关问题
Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...

What default reducers are available in Elastic MapReduce?

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...

Establishing Eclipse project environment for HadoopDB

I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...

Hadoop: intervals and JOIN

I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...

hadoop- determine if a file is being written to

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...

热门标签