我还建议使用PHP DOM,而不是常规,而后者往往不准确。 这里是你可以用来从你身上抹去所有信标和所有背景属性的范例:
// ...loading the DOM
$dom = new DOMDocument();
@$dom->loadHTML($string); // Using @ to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName( img );
$imgs = array();
foreach($images as $img) {
$imgs[] = $img;
}
foreach($imgs as $img) {
$img->parentNode->removeChild($img);
}
// This part strips all background attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName( body );
$bodybg = array();
foreach($bodies as $bg) {
$bodybg[] = $bg;
}
foreach($bodybg as $bg) {
$bg->removeAttribute( background );
}
$str = $dom->saveHTML();
I ve selected the body tags instead of the table, as the <table>
itself doesn t have a background
attribute, it only has bgcolor
.
To strip the background inline css property, you can use the sabberworm s PHP CSS Parser
to parse the CSS retrieved from the DOM: try this
// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName( * );
$tags = array();
foreach($alltags as $tag) {
$tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
$oParser = new CSSParser("p{".$tag->getAttribute( style )."}");
$oCss = $oParser->parse();
foreach($oCss->getAllRuleSets() as $oRuleSet) {
$oRuleSet->removeRule( background );
$oRuleSet->removeRule( background-image );
}
$css = $oCss->__toString();
$css = substr_replace($css, , 0, 3);
$css = substr_replace($css, , -2, 2);
if($css)
$tag->setAttribute( style , $css);
}
举例说,如果你有的话
$string = <!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html> ;
PHP将output
<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>