English 中文(简体)
删除Xml文档中的线(每字)
原标题:Remove lines from xml file if contains same words (perl)
  • 时间:2012-01-13 15:55:30
  •  标签:
  • perl

我有一份题为“频率.xml”的文件,其中载有这一表格的内容:

<?xml version="1.0"?>
<!DOCTYPE stationlist PUBLIC "-//xxxxx//DTD stationlist 1.0//EN"   "http://xxxxxxxxx/DTD/xxxxxxxx.dtd">
<frequencies xmlns="http://xxxxxxxxxxxxxxxx/DTD/">
 <list norm="PAL" frequencies="Custom" audio="bg">
..............................................................
<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
..............................................................
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>
<station name="F" active="0" channel="48.25MHz" norm="PAL"/>
..............................................................
<station name="G" active="1" channel="55.25MHz" norm="PAL"/>
<station name="H" active="0" channel="62.25MHz" norm="PAL"/>
..............................................................
  </list>
 </frequencies>

如果含有与另一行相同的频率,我想删除被认为重复的线。

产出结果:

<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>

我写这封信:

for i in `cat frequencies.xml | sed  s/.*channel="([^"]*)".*/1/; /</ d  |grep MHz`; do
cat frequencies.xml | awk -v i="channel="$i"  
    BEGIN       { a=0 }
    $0 ~ i      { if ( a == "1" ) { print i"" - duplicate" > "/dev/stderr"  ; next ;} ; a=1 } 
            { print $_ }  > frequencies.xml.tmp && 
mv frequencies.xml.tmp frequencies.xml
done

如何用硬性语言来处理这个问题?

增 编

最新情况:我想保持XML的结构。

我的法典:

open (FH, "+< frequencies.xml") or die "Opening: $!";
my $out =   ;
my %seen = ();
foreach my $line ( <FH> ) {
   if ( $line =~ m/<station/ ) {
        my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
            $out .= $line unless $seen{$freq}++;
    } else {
        $out .= $line;
    }
}
seek(FH,0,0)                    or die "Seeking: $!";
print FH $out                   or die "Printing: $!";
truncate(FH, tell(FH))          or die "Truncating: $!";
close(FH)                       or die "Closing: $!";
最佳回答

• 跟踪你所看到的频率,如果你看到,则不提:

open INPUT,  < ,  frequencies.xml  or die "Can t read file : $!";
my %seen = ();
foreach my $line ( <INPUT> ) {
   my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
   print $line unless $seen{$freq};
   $seen{$freq}++;
}
close INPUT;

www.un.org/Depts/DGACM/index_russian.htm

如果还有其他线要保持下去,你就只需要印制。 如果测试包含一个<条码>和代号;station>要素,并印刷其他一切......但一旦你开始变得比这更加复杂,你可能想使用一个真正的Xarsers。 因此,利用Zaid的建议:

open INPUT,  < ,  frequencies.xml  or die "Can t read file : $!";
my %seen = ();
foreach my $line ( <INPUT> ) {
   if ( $line =~ m/<station/ ) {
      my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
      print $line unless $seen{$freq}++;
   } else {
      print $line;
   }
}
close INPUT;
问题回答

使用一线文字的一种方式:

perl -ne  ($freq) = m/(?i)channel="([^"]+)/; print unless exists $arr{ $freq }; $arr{ $freq } = 1  infile
open(IN,  < ,  frequencies.xml ) or die;
while ($inline = <IN>) {
  $inline =~ /([d.]+)MHz/;
  $freq = $1;
  push(@out, $inline) unless (grep(/$freq/, @out));
}
print "@out
";
$ perl -pi.tmp -ale  $_="" if $seen{ $F[2] }++  frequencies.xml

使用XML:XSH2:

use XML::XSH2;
xsh q{
    open so-8853324.xml;
    $ch := hash @channel //station;
    for { keys %$ch } ls xsh:lookup("ch", .)[1];
};

我从数据中删除了名称空间,以简化代码。





相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...

热门标签