I m 寻找基于Hadoop的研究/执行项目,我参加了在维基网页上公布的名单:。 但该网页最后一次更新是在2009年9月。 因此,我不相信其中一些想法是否已经落实。 我特别关心“在MR框架内实现节制和节制优化”,讨论“把几个地图的成果纳入到弹 before之前。 这可减少寻找工作和中间储存”。
是否有人试图这样做? 是否在目前版本的Hadoop实施?
I m 寻找基于Hadoop的研究/执行项目,我参加了在维基网页上公布的名单:。 但该网页最后一次更新是在2009年9月。 因此,我不相信其中一些想法是否已经落实。 我特别关心“在MR框架内实现节制和节制优化”,讨论“把几个地图的成果纳入到弹 before之前。 这可减少寻找工作和中间储存”。
是否有人试图这样做? 是否在目前版本的Hadoop实施?
The project description is aimed "optimization". This feature is already present in the current Hadoop-MapReduce and it can probably run in a lot less time. Sounds like a valuable enhancement to me.
合并功能(见http://wiki.apache.org/hadoop/HadoopMapReduce)下描述),后者比较无休无止。 但是,我认为,合并者只是将钥匙价值乘以单一地图工作,而不是某个定点或 r的所有配对。
I think it is very challenging task. In my understanding the idea is to make a computation tree instead of "flat" map-reduce.The good example of it is Google s Dremel engine (called BigQuey now). I would suggest to read this paper: http://sergey.melnix.com/pub/melnik_VLDB10.pdf
If you interesting in this kind of architecture - you can also take a look on the open source clone of this technology - Open Dremel.
http://code.google.com/p/dremel/
I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...
I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...
I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...
I am researching Hadoop to see which of its products suits our need for quick queries against large data sets (billions of records per set) The queries will be performed against chip sequencing data. ...
I am implementing a Hadoop Map reduce job that needs to create output in multiple S3 objects. Hadoop itself creates only a single output file (an S3 object) but I need to partition the output into ...
I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...
Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...
I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...