Question

现在,我就把我的头盔打在最久的时间里,希望有人能够提供帮助。基本上,我有一个WYSIWYYYG领域,用户可以打造格式文本。当然,它们将复制和草签字/网站/名称。因此,我有一份联合材料,收集过去的投入。我的职责是,在案文中删除所有格式,但我想让它留下像p一样的标签,这样它就不仅仅是一个大的东西。

有任何正则表达式的高手吗？这里是我目前所拥有的，它能够工作。只需要允许标签。

o.node.innerHTML=o.node.innerHTML.replace(/(<([^>]+)>)/ig,"");

Answer 1

浏览器已经拥有一个完美的解析HTML树形结构在o.node中。将文档内容序列化为HTML（使用innerHTML），试图用正则表达式修改它（无法可靠地解析HTML），然后通过设置innerHTML重新解析结果回到文档内容...真的有点扭曲。

相反，检查您已经在 o.node 内拥有的元素和属性节点，删除您不想要的节点，例如：

filterNodes(o.node, {p: [], br: [], a: [ href ]});

被定义为：

// Remove elements and attributes that do not meet a whitelist lookup of lowercase element
// name to list of lowercase attribute names.
//
function filterNodes(element, allow) {
    // Recurse into child elements
    //
    Array.fromList(element.childNodes).forEach(function(child) {
        if (child.nodeType===1) {
            filterNodes(child, allow);

            var tag= child.tagName.toLowerCase();
            if (tag in allow) {

                // Remove unwanted attributes
                //
                Array.fromList(child.attributes).forEach(function(attr) {
                    if (allow[tag].indexOf(attr.name.toLowerCase())===-1)
                       child.removeAttributeNode(attr);
                });

            } else {

                // Replace unwanted elements with their contents
                //
                while (child.firstChild)
                    element.insertBefore(child.firstChild, child);
                element.removeChild(child);
            }
        }
    });
}

// ECMAScript Fifth Edition (and JavaScript 1.6) array methods used by `filterNodes`.
// Because not all browsers have these natively yet, bodge in support if missing.
//
if (!( indexOf  in Array.prototype)) {
    Array.prototype.indexOf= function(find, ix /*opt*/) {
        for (var i= ix || 0, n= this.length; i<n; i++)
            if (i in this && this[i]===find)
                return i;
        return -1;
    };
}
if (!( forEach  in Array.prototype)) {
    Array.prototype.forEach= function(action, that /*opt*/) {
        for (var i= 0, n= this.length; i<n; i++)
            if (i in this)
                action.call(that, this[i], i, this);
    };
}

// Utility function used by filterNodes. This is really just `Array.prototype.slice()`
// except that the ECMAScript standard doesn t guarantee we re allowed to call that on
// a host object like a DOM NodeList, boo.
//
Array.fromList= function(list) {
    var array= new Array(list.length);
    for (var i= 0, n= list.length; i<n; i++)
        array[i]= list[i];
    return array;
};

Answer 2

首先，我不确定正则表达式是否是适合这种情况的正确工具。用户可能会输入无效的HTML（忘记输入>，或者将>放在属性中），那么正则表达式就会失败。不过我不确定解析器是否更好/更牢固。



其次，您的正则表达式中有一些不必要的括号。

第三，您可以使用前瞻来排除某些标签：

o.node.innerHTML=o.node.innerHTML.replace(/<(?!s*/?(br|p))[^>]+>/ig,"");


解释:

"<" 匹配开角括号。

(?!s*/?(br|p)) 断言不能匹配零个或多个空白字符、零个或一个 /，任何一个 br 或 p，直接跟着一个单词边界。单词边界很重要，否则可能会触发像
或这样的标签前瞻。

[^>]+ 匹配一个或多个非闭合尖括号的字符

对关闭的尖括号进行匹配。

请注意，如果结束角括号出现在标签内的某个位置，可能会遇到问题。

这样做可以匹配（并剥离）

<pre> <a href="dot.com"> </a> </pre>的中文翻译是：<pre> <a href="dot.com"> </a> </pre>。

休假

 
 < /br> 
 
 
等等。

孤独。

友情链接