English 中文(简体)
Talend tExtractXMLField
原标题:

I have this job in Talend that is supposed to retrieve a field and loop through it.

My big problem is that the code is looping through the XML fields but it s returning null. Here is a sample of the XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
    <empresa>
        <imoveis>
            <imovel>
                [-- some fields --  ]

                <fotos>
                    <nome id="" order="">photo1</nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                </fotos>
            </imovel>
            [ -- other entries here -- ]
        </imoveis>
    </empresa>
</empresas>

Now using the tExtractXMLField component I am trying to get the "fotos" element. Here is what I have in the component: enter image description here

I have tried to change the XPath query and the XPath loop query but the result is either I don t loop through the field or I get the null in the value field in the tMap.

Here is an image of the job:

enter image description here

You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can t seem to find the problem :(

Hope someone can help me out. Thanks Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

问题回答

If you want to loop on <nome> nodes your Loop XPath Query has to be

"/empresas/empresa/imoveis/imovel/fotos/nome"

and foto_nome XPath Query something like

"text()"

Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.

The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".

Your extractXMLField component looks to be well configured. Also, I don t know what tSetGlobalVar does in your design, but make sure it doesn t affect the fotos element that you re trying to pass through.

sample talend job
I have made a test job, this will help you definitely. If I m not wrong you want to get all the "nome" under the "fotos" tag.

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.

Also make sure that the encoding is set correctly in the tFileInputXML.

I think you are confusing reading XML and extracting XML from XML.

Reading XML: If the part of XML you have provided is the file readed by you tFileInputXML you don t need tExtractXMLField, just configure the tFileInputXML as this:

  • set the xpath loop to the <nome> elements, like this "//nome"
  • add 3 columns in the tFileInputXML component id, order and content
  • get content column with xpath query "."
  • get id value with xpath query "@id"
  • get order value with xpath query "@order"

enter image description here

Extracting XML from XML: That is the goal of the tExtractXMLField component: It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.

To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML. It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:

<arg2> 
  <![CDATA[
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <exportInscriptionEnLigneType>
      <date>2015-04-10</date>
      <nbDossiers>2</nbDossiers>
      <reference>20150410100</reference>
      <listeDossiers>
        <dossier>
          <numOrdre>1</numOrdre>
          <identifiantDossier>AAAAA</identifiantDossier>
        </dossier>
        <dossier>
          <numOrdre>2</numOrdre>
          <identifiantDossier>BBBBB</identifiantDossier>
        </dossier>
      </listeDossiers>
    </exportInscriptionEnLigneType>
]]>
</arg2> 

In XML above, arg2>element contains an XML document that you may need to parse.

tExtractXMLField has been created for this purpose. I ve written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.

Hope it will help.

Best regards,





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签