Question

I have gone thru few hadoop info books and papers.

A Slot is a map/reduce computation unit at a node. it may be map or reduce slot. As far as, i know split is a group of blocks of files in HDFS which have some length and location of nodes where they ares stored. Mapper is class but when the code is instantiated it is called map task. Am i right ? I am not clear of difference and relationship between map tasks, data splits and Mapper.

Regarding scheduling i understand that when a map slot of a node is free a map task is choosen from the non-running map task and launched if the data to be processed by the map task is the node. Can anyone explain it clearly in terms of above concepts: slots, mapper and map task etc.

Thanks, Arun

Answer 1

就我所知,分裂是克族民共体档案的一部分,其长度和地点与存放地点相同。

投入材料是一个数据单位,由特定地图绘制者处理。它不仅仅是一个克民共体阵区。它可以是一条单行,从一个行到另一个行到100个行,一个50个元数据文档等。

我不清楚地图任务、数据分割和地图绘制之间的区别和关系。

材料由地图任务处理,制图员的事例是地图任务。

Answer 2

As I understand:
first data split in HDFS to the Data nodes
then when there are a new job , the job tracker divide this job into Map and reduce tasks and then Job tracker assign each map task to the node which already has the split of data related to this map task so the data is local in the node and there will be no cost for moving data so the execution time be less as possible
but sometimes we have to assign task to node which has not the data on it , so the node has to get the data through network and then processed it

Answer 3

投入分成不是数据,而是指图减少过程的具体数据。通常与块体大小相同,因为如果两者的大小不相同,有些数据是不同的线索,那么我们需要转让这些数据。

Answer 4

MAPPER : mapper is a class. MAPPER PHASE : mapper phase is a input,output code in to convert the values in keys and values pairs(keys,values). MAPPER SLOT : to execute the mapper and reducer code.

友情链接