Question

正如标题所解释的那样,当我执行我的 Hadoop 程序(并以本地模式调试它)时,发生下列情况:

1 在我测试数据中的所有 10 csv- line 中, 映射器、分区器和 RawComparator (输出键eycomparator Class) 中处理正确。地图步骤后称为 RawComparator( 输出键键) 和 RawComparator 。但是输出值组合比较器 Class 和降低缩放列表函数在后不会被执行。

2. 我的应用程序看起来像如下。 (由于空间限制, 我忽略了用于配置参数的分类, 直到某人有想法, 涉及到他们 :

public class RetweetApplication {

    public static int DEBUG = 1;
    static String INPUT = "/home/ema/INPUT-H";
    static String OUTPUT = "/home/ema/OUTPUT-H "+ (new Date()).toString();

    public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(RetweetApplication.class);


    if(DEBUG > 0){
        conf.set("mapred.job.tracker", "local");
        conf.set("fs.default.name", "file:///");
        conf.set("dfs.replication", "1");
    }


    FileInputFormat.setInputPaths(conf, new Path(INPUT));   
    FileOutputFormat.setOutputPath(conf, new Path(OUTPUT));


    //conf.setOutputKeyClass(Text.class);
    //conf.setOutputValueClass(Text.class);
    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(Text.class);

    conf.setMapperClass(RetweetMapper.class);
    conf.setPartitionerClass(TweetPartitioner.class);
    conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
    conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
    conf.setReducerClass(RetweetReducer.class);

    conf.setOutputFormat(TextOutputFormat.class);

    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
    }
}

3. 我得到以下控制台输出( 抱歉格式格式, 但不知何故此日志格式不正确 ):

12/05/22 03:51:05 INFO mapred.MapTask: io.sort.mb = 100 12/05/22 03:51:05 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/22 03:51:05 INFO地图显示。

12/05/22 03:51:06 INFO 地图化。

12/05/22 03:51:11 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:12 INFO mapred.JobClient: map 39% reduce 0%

12/05/22 03:51:14 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:15 INFO mapred.MapTask: Starting flush of map output

12/05/22 03:51:15 INFO已映射成地图。

12/05/22 03:51:15 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting

12/05/22 03:51:15 INFO地图。

12/05/22 03:51:17 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17 INFO mapred.Task: Task attempt_local_0001_m_000000_0 done.

12/05/22 03:51:17 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@35eed0

12/05/22 03:51:17 INFO mapred.ReduceTask: ShuffleRamManager: MemoryLimit=709551680, MaxSingleShuffleLimit=177387920

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging on-disk files

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread waiting: Thread for merging on-disk files

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging in memory files

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 1 map output(s) where 0 is already in progress 12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for polling Map Completion Events

12/05/22/05/22 03:51:18 INFO 映射。 JobClient: 地图: 100% 减少 0% 12/ 05/ 22 12/ 05/ 22 12/ 51: 23 INFO 映射 03: 51: 23 INFO 映射。本地JobRunner: 减少 & gt; 复制 & gt;

从这一点上,大胆的标记线无休止地重复。

4. 地图绘制器看到每个图例后,许多开放进程都活跃:

RetweetApplication (1) [Remote Java Application]    
    OpenJDK Client VM[localhost:5002]   
        Thread [main] (Running) 
        Thread [Thread-2] (Running) 
        Daemon Thread [communication thread] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)  
        Daemon Thread [Thread for merging on-disk files] (Running)  
        Daemon Thread [Thread for merging in memory files] (Running)    
        Daemon Thread [Thread for polling Map Completion Events] (Running)

是否有任何理由, 为何 Hadoop 期望从映像器( 见日志中的粗体标记行) 得到比我输入目录中输入的更多输出? 如前所述, 我调试了所有输入都在映像器/ 编程器/ etc 中得到正确处理。

UPDATE With the help of Chris (see comments) i found out, that my program was NOT started in localMode as i expected it: the isLocal variable in the ReduceTask class is set to false, though it should be true.

对我来说,完全不清楚为什么会发生这种情况,因为为允许独立模式而必须设置的3个选项被设定为正确的方式。令人惊讶的是:local 设置被忽略了,“从普通盘中读取”设置没有被忽略,这很奇怪,因为我认为“code>local 模式和files:/// 协议是同时的。

在调试 Reduce Task 期间,我通过在我的调试视图中评价 is dollical = ret 来设置 变量为真。然后尝试执行程序的其他部分。它没有成功, 这是堆放跟踪 :



12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:28 INFO mapred.Merger: Merging 1 sorted segments
12/05/22 14:28:28 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1956 bytes
12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:30 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 0 time(s).
12/05/22 14:28:31 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 1 time(s).
12/05/22 14:28:32 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 2 time(s).
12/05/22 14:28:33 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 3 time(s).
12/05/22 14:28:34 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 4 time(s).
12/05/22 14:28:35 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 5 time(s).
12/05/22 14:28:36 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 6 time(s).
12/05/22 14:28:37 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 7 time(s).
12/05/22 14:28:38 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 8 time(s).
12/05/22 14:28:39 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 9 time(s).
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 WARN mapred.LocalJobRunner: job_local_0001
java.net.ConnectException: Call to master/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:446)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
    at org.apache.hadoop.ipc.Client.call(Client.java:1046)
    ... 17 more
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 INFO mapred.JobClient: Job complete: job_local_0001
12/05/22 14:28:39 INFO mapred.JobClient: Counters: 20
12/05/22 14:28:39 INFO mapred.JobClient:   File Input Format Counters 
12/05/22 14:28:39 INFO mapred.JobClient:     Bytes Read=967
12/05/22 14:28:39 INFO mapred.JobClient:   FileSystemCounters
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_READ=14093
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=47859
12/05/22 14:28:39 INFO mapred.JobClient:   Map-Reduce Framework
12/05/22 14:28:39 INFO mapred.JobClient:     Map output materialized bytes=1960
12/05/22 14:28:39 INFO mapred.JobClient:     Map input records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/05/22 14:28:39 INFO mapred.JobClient:     Spilled Records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Map output bytes=1934
12/05/22 14:28:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=115937280
12/05/22 14:28:39 INFO mapred.JobClient:     CPU time spent (ms)=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map input bytes=967
12/05/22 14:28:39 INFO mapred.JobClient:     SPLIT_RAW_BYTES=82
12/05/22 14:28:39 INFO mapred.JobClient:     Combine input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input groups=0
12/05/22 14:28:39 INFO mapred.JobClient:     Combine output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map output records=10
12/05/22 14:28:39 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
    at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


既然现在这个堆叠轨迹向我展示了 9001号港口在行刑过程中被使用, 我猜, xml 配置文件 不知何故地在本地 Java 制作的设置( 我用来测试) 上, 奇怪的是, 自从我在互联网上反复阅读过, java 覆盖了 xml 配置。 如果没人知道如何纠正这一点, 就不试着简单地删除所有的配置xm 。 也许这可以解决问题...

<强 > NEW 更新 

重命名 Hadoops conf  文件夹解决了等待复印机的问题,程序直到最后才执行。 不幸的是,尽管设置的HADOOP_OPTS 正确,但执行不再等待我的调试器。

这只是一个配置问题: XML 可能( 某些配置参数) 覆盖 JAVA 。 如果有人知道我怎样才能调试再运行的话, 它将会是完美的, 但是现在我很高兴我再也看不到这个堆叠图了! )!

谢谢你的时间和时间! 克里斯!

Answer 1

抱歉我之前没有看到这个, 但是您似乎在 conf xml 文件中有两个重要的配置属性被设定为最终的, 如下日志语句所示 :

12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.

这意味着您的工作无法以本地模式实际运行, 它以本地模式启动, 但缩写器读取序列化的工作配置并确定它不是本地模式, 并尝试通过任务跟踪端口获取地图输出。

您说您要重新命名 conf 文件夹, 您说您要重新命名此 conf 文件夹 - 这将默认默认 hadoop 返回到默认配置, 而这两个属性没有标记为最终

友情链接