English 中文(简体)
可变/分散的工作顺序
原标题:Variable/looping sequence of jobs
  • 时间:2010-09-02 20:08:32
  •  标签:
  • hadoop

Im考虑利用 had/图示来处理一个项目,并且粗略地说明如何建立由数量可变的、按顺序处理的工作流程。

例如:

Job 1: Map source data into X levels.
Job 2: MapReduce Level1 -> appends to Level2
Job 3: MapReduce Level2 -> appends to LevelN
Job N: MapReduce LevelN -> appends to LevelN+1

直至最后一级。 关键是,每个层次必须包含自己的具体来源数据以及前一级的成果。

我看着猪、 h、ham和木.,但还没有看到这种支持。

是否有任何人知道实现这一目标的有效方法? 现在,我 le笑,为ham打造一个包裹,以根据参数制作假文件(在操作时间知道数量,但每次操作都会改变)。

感谢!

问题回答




相关问题
Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...

What default reducers are available in Elastic MapReduce?

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...

Establishing Eclipse project environment for HadoopDB

I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...

Hadoop: intervals and JOIN

I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...

hadoop- determine if a file is being written to

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...

热门标签