English 中文(简体)
• 如何从Xml文档中删除重复点
原标题:how to remove duplicate nodes from xml file using perl
  • 时间:2011-11-22 10:02:03
  •  标签:
  • perl

我正在从多个方面创建一个Xml文档,我需要删除重复的 no形输出xml。 我有这样的文字,以产生新的xml文档。

 #!/usr/bin/perl
 use warnings;
 use strict;
 use XML::LibXML;
 use Carp;
 use File::Find;
 use File::Spec::Functions qw( canonpath );
 use XML::LibXML::Reader;
 use Digest::MD5  md5 ;

 if ( @ARGV == 0 ) {
     push @ARGV, "c:/main/sav ";
     warn "Using default path $ARGV[0]
  Usage: $0  path ...
";
 }

 open( my $allxml,  > , "combined.xml" )
     or die "can t open output xml file for writing: $!
";
 print $allxml  <?xml version="1.0" encoding="UTF-8"?> ,
  "
<Datainfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
";
 my %extract_md5;
 find(
      sub {
          return unless ( /(_str.xml)$/ and -f );
          extract_information();
          return;
      },
      @ARGV
     );

 print $allxml "</Datainfo>
";

 sub extract_information {
     my $path = $_;
     if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
         while ( $reader->nextElement(  Data  )) {
             my $elem = $reader->readOuterXml();
             my $md5 = md5( $elem );
             print $allxml $reader->readOuterXml() unless ( $extract_md5{$md5}++ );
         }

     }
     return;
 }

But from above script printing xml file like this

合计:xml

<?xml version="1.0" encoding="UTF-8"?>
<Datainfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <data>
        <test>22</test>
        <info>sensor value</info>
        <sensor>
            <sensor value="23" temp="25"/>
        </sensor>
    </data>
    <data>
        <test>23</test>
        <info>sensor value</info>
        <sensor>
            <sensor value="24" temp="27"/>
        </sensor>
    </data>
    <data>
        <test>22</test>
        <info>sensor value</info>
        <sensor>
            <sensor value="22" temp="26"/>
        </sensor>
    </data>
</Datainfo>

In the above xml file I have data element test(22) is repeated in two times. I need to use test as the element to search in file if same test number is found what ever may be the information inside that node I need to delete that entire node information. I tried to do with md5 but it removing duplicate nodes from allxml files but now I need to search one specific element and delete entire node information if duplicate is occurred.please help me with this problem.
output like this

合计:xml

<?xml version="1.0" encoding="UTF-8"?>
<Datainfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <data>
        <test>22</test>
        <info>sensor value</info>
        <sensor>
            <sensor value="23" temp="25"/>
        </sensor>
    </data>
    <data>
        <test>23</test>
        <info>sensor value</info>
        <sensor>
            <sensor value="24" temp="27"/>
        </sensor>
    </data>
</Datainfo>
问题回答

I normally use XML::Simple for things like this.

<代码>XML:Simple 贮存您的XML档案,存放在散射/传导结构中。 这将自动消除你重新发现的重复问题(取决于你是如何混淆这一问题)。

You will have to do the duplicate checking by specifically checking <test> contents, instead of md5 of the entire node.

E.g. instead of my $md5 = md5( $elem ); and storing $md5 key in the hash, you need to extract the contents of <test> tag and store that.

我倾向于不提供更多细节,因为你似乎只是 s笑了SO和Perl Monks,要求帮助你开展工作,复制/复制一些复杂的法典,而你却不 t他们试图理解它是如何运作的。

http://www.perlmonks.org/?node_id=939272





相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...

热门标签