我有这个输入文本:
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body><table cellspacing="0" cellpadding="0" border="0" align="center" width="603"> <tbody><tr> <td><table cellspacing="0" cellpadding="0" border="0" width="603"> <tbody><tr> <td width="314"><img height="61" width="330" src="/Elearning_Platform/dp_templates/dp-template-images/awards-title.jpg" alt="" /></td> <td width="273"><img height="61" width="273" src="/Elearning_Platform/dp_templates/dp-template-images/awards.jpg" alt="" /></td> </tr> </tbody></table></td> </tr> <tr> <td><table cellspacing="0" cellpadding="0" border="0" align="center" width="603"> <tbody><tr> <td colspan="3"><img height="45" width="603" src="/Elearning_Platform/dp_templates/dp-template-images/top-bar.gif" alt="" /></td> </tr> <tr> <td background="/Elearning_Platform/dp_templates/dp-template-images/left-bar-bg.gif" width="12"><img height="1" width="12" src="/Elearning_Platform/dp_templates/dp-template-images/left-bar-bg.gif" alt="" /></td> <td width="580"><p> what y all heard?</p><p>i m shark oysters.</p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p></td> <td background="/Elearning_Platform/dp_templates/dp-template-images/right-bar-bg.gif" width="11"><img height="1" width="11" src="/Elearning_Platform/dp_templates/dp-template-images/right-bar-bg.gif" alt="" /></td> </tr> <tr> <td colspan="3"><img height="31" width="603" src="/Elearning_Platform/dp_templates/dp-template-images/bottom-bar.gif" alt="" /></td> </tr> </tbody></table></td> </tr> </tbody></table> <p> </p></body></html>
正如您所看到的,在此 HTML 文本块中没有换行,并且我需要查找其中的所有图像链接,将它们复制到一个目录中,并将文本内的行更改为类似于 ./images/file_name
的东西。
目前我使用的Perl代码是这样的:
my ($old_src,$new_src,$folder_name);
foreach my $record (@readfile) {
## so the if else case for the url replacement block below will be correct
$old_src = "";
$new_src = "";
if ($record =~ /<img(.+)/){
if($1=~/src="((w|_|\|-|/|.|:)+)"/){
$old_src = $1;
my @tmp = split(//Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
push (@images, $new_src);
$folder_name = "images";
}## end if
}
elsif($record =~ /background="(.+.jpg)/){
$old_src = $1;
my @tmp = split(//Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
push (@images, $new_src);
$folder_name = "images";
}
elsif($record=~/<iframe(.+)/){
if($1=~/src="((w|_|\|?|=|-|/|.|:)+)"/){
$old_src = $1;
my @tmp = split(//Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
## remove the ?rand behind the html file name
if($new_src=~/?rand/){
my ($fname,$rand) = split(/?/,$new_src);
$new_src = $fname;
my ($fname,$rand) = split(/?/,$old_src);
$old_src = $fname."\?".$rand;
}
print "old_src::$old_src
"; ##s7test
print "new_src::$new_src
"; ##s7test
push (@iframes, $new_src);
$folder_name = "iframes";
}## end if
}## end if
my $new_record = $record;
if($old_src && $new_src){
$new_record =~ s/$old_src/$new_src/ ;
print "new_record:$new_record
"; ##s7test
my @tmp = split(///,$new_src);
$new_record =~ s/$new_src/.\$folder_name\$tmp[-1]/;
## print "new_record2:$new_record
"; ##s7test
}## end if
print WRITEFILE $new_record;
} # foreach
This is only sufficient to handle HTML text with newlines in them. I thought only looping the regex statement, but then i would have to change the matching line to some other text.
Do you have any idea if there an elegant Perl way to do this? Or maybe I m just too dumb to see the obvious way of doing it, plus I know putting global option doesn t work.
thanks. ~steve