English 中文(简体)
位置、地图任务、数据分割、地图绘制之间的差别和关系
原标题:Difference and relationship between slots, map tasks, data splits, Mapper

I have gone thru few hadoop info books and papers.

A Slot is a map/reduce computation unit at a node. it may be map or reduce slot. As far as, i know split is a group of blocks of files in HDFS which have some length and location of nodes where they ares stored. Mapper is class but when the code is instantiated it is called map task. Am i right ? I am not clear of difference and relationship between map tasks, data splits and Mapper.

Regarding scheduling i understand that when a map slot of a node is free a map task is choosen from the non-running map task and launched if the data to be processed by the map task is the node. Can anyone explain it clearly in terms of above concepts: slots, mapper and map task etc.

Thanks, Arun

问题回答

就我所知,分裂是克族民共体档案的一部分,其长度和地点与存放地点相同。

投入材料是一个数据单位,由特定地图绘制者处理。 它不仅仅是一个克民共体阵区。 它可以是一条单行,从一个行到另一个行到100个行,一个50个元数据文档等。

我不清楚地图任务、数据分割和地图绘制之间的区别和关系。

材料由地图任务处理,制图员的事例是地图任务。

As I understand:
first data split in HDFS to the Data nodes
then when there are a new job , the job tracker divide this job into Map and reduce tasks and then Job tracker assign each map task to the node which already has the split of data related to this map task so the data is local in the node and there will be no cost for moving data so the execution time be less as possible
but sometimes we have to assign task to node which has not the data on it , so the node has to get the data through network and then processed it

投入分成不是数据,而是指图减少过程的具体数据。 通常与块体大小相同,因为如果两者的大小不相同,有些数据是不同的线索,那么我们需要转让这些数据。

MAPPER : mapper is a class. MAPPER PHASE : mapper phase is a input,output code in to convert the values in keys and values pairs(keys,values). MAPPER SLOT : to execute the mapper and reducer code.





相关问题
Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...

What default reducers are available in Elastic MapReduce?

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...

Establishing Eclipse project environment for HadoopDB

I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...

Hadoop: intervals and JOIN

I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...

hadoop- determine if a file is being written to

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...

热门标签