English 中文(简体)
100%的CPU使用量,视投入时间而定。
原标题:100% CPU usage with a regexp depending on input length

我试图在沙捞越时 with,但必须满足任何特性,避免连续三个或三个以上的 com子或半殖民地。 换言之,只允许连续两度 com或半殖民地。

因此,这是我目前拥有的:

^(,|;){,2}([^,;]+(,|;){,2})*$

似乎按预期开展工作:

>>> r.match(  )
<_sre.SRE_Match object at 0x7f23af8407e8>
>>> r.match( foo, )
<_sre.SRE_Match object at 0x7f23af840750>
>>> r.match( foo, a )
<_sre.SRE_Match object at 0x7f23af8407e8>
>>> r.match( foo, , )
<_sre.SRE_Match object at 0x7f23af840750>
>>> r.match( foo, ,,a )
<_sre.SRE_Match object at 0x7f23af8407e8>
>>> r.match( foo, ,,, )
>>> r.match( foo, ,,,; )
>>> r.match( foo, ,, ;; )
<_sre.SRE_Match object at 0x7f23af840750>

但是,随着我开始增加投入案文的篇幅,似乎需要更多时间来作出答复。

>>> r.match( foo, bar, baz,, foo )
<_sre.SRE_Match object at 0x7f23af8407e8>
>>> r.match( foo, bar, baz,, fooooo, baaaaar )
<_sre.SRE_Match object at 0x7f23af840750>
>>> r.match( foo, bar, baz,, fooooo, baaaaar, )
<_sre.SRE_Match object at 0x7f23af8407e8>
>>> r.match( foo, bar, baz,, fooooo, baaaaar,, )
<_sre.SRE_Match object at 0x7f23af840750>
>>> r.match( foo, bar, baz,, fooooo, baaaaar,,, )
>>> r.match( foo, bar, baz,, fooooo, baaaaar,,,, )
>>> r.match( foo, bar, baz,, fooooo, baaaaar, baaaaaaz,,,, )

And finally it gets completely stuck at this stage and the CPU usage goes up to 100%.

I m not sure if the regexp could be optimized or there s something else involved, any help appreciated.

最佳回答

You re running into catastrophic backtracking.

The reason for this is that you have made the separators optional, and therefore the [^,;]+ part (which is itself in a repeating group) of your regex will try loads of permutations (of baaaaaaaz) before finally having to admit failure when confronted with more than two commas.

RegexBuddy aborts the match attempt after 1.000.000 steps of the regex engine with your last test string. Python will keep trying.

Imagine the string baaz,,,:

在座各位,游艇发动机必须检查所有这些:

  1. baaz,,<failure>
  2. baa + z,,<failure>
  3. ba + az,,<failure>
  4. ba + a + z,,<failure>
  5. b + aaz,,<failure>
  6. b + aa + z,,<failure>
  7. b + a + az,,<failure>
  8. b + a + a +z,,<failure>

在宣布全面失败之前。 参看这种增长如何与每一种额外性成倍增长?

如同这种放弃一样,可以避免拥有qu子或原子集团,它们都不幸得不到目前 Python雷发动机的支持。 但你可以很容易地进行反常检查:

if ",,," in mystring or ";;;" in mystring:
    fail()

without needing a regex at all. If ,;, and the likes could also occur and should be excluded, then use Andrew s solution.

问题回答

我认为,你想要做的是:

^(?!.*[,;]{3})

This will fail if the string contains three or more , or ; in a row. If you actually want it to match a character add a . at the end.

This utilizes negative lookahead, which will cause the entire match to fail if the regex .*[,;]{3} would match.

Try this regular expression:

^([^,;]|,($|[^,]|,[^,])|;($|[^;]|;[^;]))*$

It matches repetitively:

  • one single character that is neither , nor ;, or
  • a , that is either not followed by another , or a ,, that is not followed by another ,, or
  • a ; that is either not followed by another ; or a ;; that is not followed by another ;

until the end is reached. It is very efficient as it fails early without doing much backtracking.

How about this idea match the ones that have the pattern you don t want ".+,,," In Python just keep those that do not match. Should be fast





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签