English 中文(简体)
利用习俗档案读取中Xml文档
原标题:Using custom fileformat to read xml files in hive
  • 时间:2012-05-14 06:31:07
  •  标签:
  • hadoop
  • hive

I m new to Hadoop/Hive. I am trying to process xml files with hive.After googling for a while ,I came across custom FileFormat code for xml files that can be used for the purpose .

(Hre is the source Code for Customs xmlinputformat category : 页: 1

我添加了Xmlinputformat等级的jar,并制作了一个样本表:

create table person ( 
    name string
    )        
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY  	     
    STORED AS INPUTFORMAT  com.hadoop.xmlparser.XmlInputFormat 
    OUTPUTFORMAT  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat ;

我试图从上述表格中检索数据,并得出以下错误:

Execution Error, Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Following are the errors found in the jobtracker logs :

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
    at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.<init>(Hadoop20SShims.java:269)
    at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getRecordReader(Hadoop20SShims.java:366)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:413)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
    at sun

对上述问题的任何解决办法? 感谢!

问题回答

I haven t used this particular InputFormat but Hive assumes that records are delimited by . So you would need to make sure that your XML has no .





相关问题
Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...

What default reducers are available in Elastic MapReduce?

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...

Establishing Eclipse project environment for HadoopDB

I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...

Hadoop: intervals and JOIN

I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...

hadoop- determine if a file is being written to

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...

热门标签