English 中文(简体)
XML Diff: How to generate XML diff using XSLT?
原标题:
  • 时间:2009-11-20 15:44:53
  •  标签:
  • xml
  • xslt
  • diff

I would like to compute the diff between two XML files or nodes using XSL/XSLT. Is there any stylesheet readily available or any simple way of doing it?

问题回答

Interesting question! I once tried to do something similar involving two XML sources, and my experience was that there just ain t no way.

You could use XSL s facility for including user-built functions, and code up something really slick. But I really can t see it.

If I were to do something like this, I d process the two XML files in parallel using DOM4J, which lets me easily traverse the code programmatically and do detail sub-queries.

Trying to do this in XSLT will either prove you to be a genius or drive you into madness.

XSLT is data-driven, that is, it goes through the single source XML file top to bottom looking for template matches in the XSL stylesheet. The templates don t really know where they are in the data, they just run their code when matched. You can reference another XML source, but the program will run according to the traversal of the original source.

So when you arrive at the nth child element of <blarg>, for example, you could look up the nth child of <blarg> in a second XML using the document() function. But the usefulness of this depends on the structure of your XML and what comparisons you re trying to do.

This behavior is opposite of most traditional scripts, which run through the program code top to bottom, calling on the data file when instructed. The latter--pull processing--is what you probably need to compare two XML sources. XSLT will break down in comparison as soon as there is a difference.

If what you mean by diff is something like checking whether items exist in one document (or node) but not another, you can use xpath key() function with a third parameter

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs ="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xsl xs">

<xsl:param name="doc2diff" required="yes"/>
<!-- docB is root node of the "second" document -->
<xsl:variable name="docB" select="document($doc2diff)"/>
<!-- docA is the root node of the first document -->
<xsl:variable name="docA" select="/"/>
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:key name="items" match="Item" use="someId"/>

<xsl:template match="/">
 <ListOfItems>
  <In_A_NotIn_B>
   <xsl:apply-templates select="Item">
    <xsl:with-param name="otherDocument" select="$docB"/>
   </xsl:apply-templates>
  </In_A_NotIn_B>
  <In_B_NotIn_A>
   <xsl:apply-templates select="Item">
    <xsl:with-param name="otherDocument" select="$docA"/>
   </xsl:apply-templates>
  </In_B_NotIn_A>
 </ListOfItems>
</xsl:template>

<xsl:template match="Item">
 <xsl:param name="otherDocument"/>
  <xsl:variable name="SOMEID" select="someId"/>
  <xsl:if test="empty(key( items , $SOMEID, $otherDocument))">
   <xsl:copy-of select="."/>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>`

This is not a mystery! Here are the general steps:

  1. @carillonator is right about how XSLT processes documents. So to make it easier we combine the two versions of your documents into a single document you can use to run your XSLT diff on ( You can do this via the command line with bash, or with whatever programming language you are using, or even another XSLT transform [pipe] ). It s just an encapsulation:

    <diff_container>
        <version1>
          ... first version here
        </version1>
        <version2>
          ... second version here
        </version2>
    </diff_container>
    
  2. We then run this document through our XSLT diff, the XSLT then has the job of simply traversing the tree and comparing nodes between the two versions. This can go from very simple ( Was an element changed? Moved? Removed? ) to semi complex. A good understanding of XPath makes this fairly simple.

    Like some said before, your working inside a different environment so you are limited compared to tools like Diff Dog. However the benefit of having the algorithm in XSLT can have real value too.

Hope this helped. Cheers!

This is the stylesheet I wrote to compare two XML files with different order in nodes and attribute, it will generate two text files containing the ordered list all the leaf nodes path. Use any text compare tool to spot out the differences or enhanced the XSLT to do what you want.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text" indent="no" omit-xml-declaration="yes" name="output" />

<xsl:param name="OTHERFILENAME">xml_file_to_diff.xml</xsl:param>
<xsl:param name="ORIGINAL_OUTPUT_FILENAME">ORIGINAL.txt</xsl:param>
<xsl:param name="OTHER_OUTPUT_FILENAME">OTHER.txt</xsl:param>

<xsl:template match="/">
    <xsl:call-template name="convertXMLHierarchyToFullPath">
        <xsl:with-param name="node" select="*"/>
        <xsl:with-param name="filename" select="$ORIGINAL_OUTPUT_FILENAME"/>
    </xsl:call-template>
    <xsl:call-template name="convertXMLHierarchyToFullPath">
        <xsl:with-param name="node" select="document($OTHERFILENAME)/*"/>
        <xsl:with-param name="filename" select="$OTHER_OUTPUT_FILENAME"/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="convertXMLHierarchyToFullPath">
    <xsl:param name="node"/>
    <xsl:param name="filename"/>

    <xsl:variable name="unorderedFullPath">
        <xsl:apply-templates select="$node"/>
    </xsl:variable>

    <xsl:result-document href="{$filename}" format="output">
        <xsl:for-each select="$unorderedFullPath/*">
            <xsl:sort select="@path" data-type="text"/>
            <xsl:value-of select="@path"/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:for-each>
    </xsl:result-document>
</xsl:template>

<xsl:template match="*">
    <xsl:if test="not(*)">
        <leaf>
            <xsl:attribute name="path">
                <xsl:for-each select="ancestor-or-self::*">
                    <xsl:value-of select="name()"/>
                    <xsl:for-each select="@*">
                        <xsl:sort select="name()" data-type="text"/>
                        <xsl:text>[</xsl:text>
                        <xsl:value-of select="name()"/>
                        <xsl:text>:</xsl:text>
                        <xsl:value-of select="."/>
                        <xsl:text>]</xsl:text>
                    </xsl:for-each>
                    <xsl:text>/</xsl:text>
                </xsl:for-each>
                <xsl:value-of select="."/>
            </xsl:attribute>
        </leaf>
    </xsl:if>
    <xsl:apply-templates select="*"/>
</xsl:template>

There are ways to do this, but I wouldn t say it s simple.

In the past I ve used an opensource utility called diffmk, this produces an output XML with extra tags showing what has been added/removed...

I had to write an extra stylesheet to then convert this into a more readable HTML report.

Some diff tools like XMLSpy Diff dog are good, but costly.

Found this post lately but anyway I ll share my solution for this kind of problem. I had the same needs as @Vincent : comparing 2 diferents XML files and see quickly the differences between them. A quick diff had too many lines matching because files were not sorted so I decided to sort the files using XSLT and then compare the two xml files manually by using WinMerge for example (a simple unix diff can also do the job).

Here is the XSLT that sort my XML file :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:template match="node()|@*">
    <xsl:copy>
            <xsl:apply-templates select="node()|@*">
                    <xsl:sort select="name()" />
                    <xsl:sort select="@*" />
                    <xsl:sort select="*" />
                    <xsl:sort select="text()" />
            </xsl:apply-templates>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>




相关问题
how to represent it in dtd?

I have two element action and guid. guid is a required field when action is add. but when action is del it will not appear in file. How to represent this in dtd ?

.Net application configuration add xml-data

I need to add xml-content to my application configuration file. Is there a way to add it directly to the appSettings section or do I need to implement a configSection? Is it possible to add the xml ...

XStream serializing collections

I have a class structure that I would like to serialize with Xstream. The root class contains a collection of other objects (of varying types). I would like to only serialize part of the objects that ...

MS Word splits words in its XML format

I have a Word 2003 document saved as a XML in WordProcessingML format. It contains few placeholders which will be dynamically replaced by an appropriate content. But, the problem is that Word ...

Merging an XML file with a list of changes

I have two XML files that are generated by another application I have no control over. The first is a settings file, and the second is a list of changes that should be applied to the first. Main ...

How do I check if a node has no siblings?

I have a org.w3c.dom.Node object. I would like to see if it has any other siblings. Here s what I have tried: Node sibling = node.getNextSibling(); if(sibling == null) return true; else ...

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

热门标签