Question

在某些正则表达式风格中，[负面]零宽断言（look-ahead/look-behind）不被支持。

这使得排除某些情况变得极其困难（不可能？）。例如，"每一行不包含 'foo' 的行"，就像这样：

^((?!foo).)*$

不使用环视，是否仍然可以实现相同的事情（暂且不考虑复杂度和性能方面的问题）？

Answer 1

更新:正如 @Ciantic 在评论中指出的那样，它失败了“在oo之前有两个ff”的匹配。

^(f(o[^o]|[^o])|[^f])*$

注意：在客户端上仅仅否定匹配要容易得多，而不是使用上述正则表达式。

该正则表达式假定每行以换行符结尾，如果不是这样，则请参阅C++和grep的正则表达式。

Perl、Python、C++和grep的示例程序都给出相同的输出。

Perl： Perl

#!/usr/bin/perl -wn
print if /^(f(o[^o]|[^o])|[^f])*$/;

Python (链接：http://ideone.com/3CYA6)

#!/usr/bin/env python
import fileinput, re, sys
from itertools import ifilter

re_not_foo = re.compile(r"^(f(o[^o]|[^o])|[^f])*$")
for line in ifilter(re_not_foo.match, fileinput.input()):
    sys.stdout.write(line)

C++: C++

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main()
{
  boost::regex re("^(f(o([^o]|$)|([^o]|$))|[^f])*$");
  //NOTE: "|$"s are there due to `getline()` strips newline char

  std::string line;
  while (std::getline(std::cin, line)) 
    if (boost::regex_match(line, re))
      std::cout << line << std::endl;
}

grep

$ grep "^(f(o([^o]|$)|([^o]|$))|[^f])*$" in.txt

示例文件：

foo
 foo 
abdfoode
abdfode
abdfde
abcde
f

fo
foo
fooo
ofooa
ofo
ofoo

输出

abdfode
abdfde
abcde
f

fo
ofo

Answer 2

发现了这个问题，并将没有完全工作的正则表达式视为个人挑战。我相信我已经成功地创建了一种正则表达式，它可以对所有输入产生作用-前提是您可以使用原子组/占有量化器。

当然，我不确定是否存在不使用 lookahead 就允许原子分组的 flavor，但问题是在正则表达式中是否可能陈述一个不使用 lookahead 的排除，而这在技术上是可能的：

A(?:$|[^f]++|f++(?:[^o]|$)|(?:f++o)*+(?:[^o]|$))*

解释：

A                         #Start of string
(?:                        #Non-capturing group
    $                      #Consume end-of-line. We re not in foo-mode.
    |[^f]++                #Consume every non- f . We re not in foo-mode.
    |f++(?:[^o]|$)          #Enter foo-mode with an  f . Consume all  f s, but only exit foo-mode if  o  is not the next character. Thus,  f  is valid but  fo  is invalid.
    |(?:f++o)*+(?:[^o]|$)  #Enter foo-mode with an  f . Consume all  f s, followed by a single  o . Repeat, since  (f+o)*  by itself cannot contain  foo . Only exit foo-mode if  o  is not the next character following (f+o). Thus,  fo  is valid but  foo  is invalid.
)*                         #Repeat the non-capturing group
                         #End of string. Note that this regex only works in flavours that can match $

如果由于任何原因，您无法使用占位符量词或回顾，但可以使用原子群组，您可以使用：

A(?:$|(?>[^f]+)|(?>f+)(?:[^o]|$)|(?>(?:(?>f+)o)*)(?:[^o]|$))*

正如其他人指出的那样，不过，通过其他手段来取消比赛可能更实际。

Answer 3

我偶然发现了这个问题，寻找我自己的正则表达式排除解决方案，在这里我试图在正则表达式中排除一个序列。

我的初始反应是使用grep中的-v匹配选项，例如“每行不包含“foo”的行”。

grep -v foo

这会返回文件中所有不匹配“foo”的行。

这很简单，我强烈感觉我刚刚误读了你的问题....

Answer 4

你通常可以在客户端代码中查找 foo 并反转正则匹配的结果。

举个简单的例子，比如你想要验证一个字符串是否仅包含特定的字符。

你可以这样写：

^[A-Za-z0-9.$-]*$ 的中文翻译为：^[A-Za-z0-9.$-]*$。

并且接受true结果为有效，或者像这样：

[ ^ A-Za-z0-9.$- ] 的中文翻译是 "除了 A-Z、a-z、0-9、$、. 和 - 以外的任何字符"。

并接受 false 结果为有效。

Of course, this isn t always an option: sometimes you just have to put the expression in a config file or pass it to another program, for example. But it s worth remembering. Your specific problem, for example, the expression is much simpler if you can use negation like this.

友情链接