English 中文(简体)
正则表达式中的条件匹配
原标题:Conditional matching in regular expression
  • 时间:2012-05-24 20:21:57
  •  标签:
  • python
  • regex

我试图从下面的给定字符串中提取一些信息

>>> st =    
... <!-- info mp3 here -->
...                             192 kbps<br />2:41<br />3.71 mb  </div>
... <!-- info mp3 here -->
...                             3.49 mb  </div>
... <!-- info mp3 here -->
...                             128 kbps<br />3:31<br />3.3 mb   </div>
...    
>>>

现在当我使用下面的正正数时 我的输出是

>>> p = re.findall(r <!-- info mp3 here -->s+(.*?)<br />(.*?)<br />(.*?)s+</div> ,st)
>>> p
[( 192 kbps ,  2:41 ,  3.71 mb ), ( 128 kbps ,  3:31 ,  3.3 mb )]

但我需要的产出是

[( 192 kbps ,  2:41 ,  3.71 mb ),(None,None, 3.49mb ), ( 128 kbps ,  3:31 ,  3.3 mb )]

因此,我的问题是,我如何更改上面的 regex 来匹配所有条件。 我相信,我目前的regex 严格依赖于 lt;br/> 标签,所以我如何以该标签为条件。

我知道我不应该用regex 来分析 html,但现在这是我最合适的方法。

最佳回答

虽然我想知道有没有更优雅的解决方案。你当然可以将列表理解合并为一行,但我认为这样会降低代码的总体清晰度。至少这样你就可以从现在起三个月后遵循你所做的...

st =    
<!-- info mp3 here -->
                            192 kbps<br />2:41<br />3.71 mb  </div>
<!-- info mp3 here -->
                            3.49 mb  </div>
<!-- info mp3 here -->
                            128 kbps<br />3:31<br />3.3 mb   </div>
   

p = re.findall(r <!-- info mp3 here -->s+(.*?)s+</div> ,st)
p2 = [row.split( <br /> ) for row in p]
p3 = [[None]*(3 - len(row)) + row for row in p2]

>>> p3
[[ 192 kbps ,  2:41 ,  3.71 mb ], [None, None,  3.49 mb ], [ 128 kbps ,  3:31 ,  3.3 mb ]]

取决于字符串的变异性, 你可能需要写一个更通用的清洁功能, 将它条纹, 案件, 随便什么, 并绘制成地图, 绘制到您退出的每件物品上 。

问题回答

这里的regex 解答方法比较具体一些。 我不确定这比> 回答更可取,但我猜想我按要求回答问题。前两个选择组不是返回 noone ,而是返回空字符串 ,我想这大概足够接近了。

注意嵌套组结构。 前两个外组是可选的, 但需要 < code_ lt; br/ & gt; 标签才能匹配。 这样, 如果小于两个 < code_ lt; br/ gt; 标签, 最后一个项目直到结尾才匹配 :

rx = r   <!-- info mp3 here -->s+   # verbose mode; escape literal spaces
         (?:                             # outer non-capturing group  
            ([^<>]*)                     # inner capturing group without <>
            (?:<br />)                  # inner non-capturing group matching br
         )?                              # whole outer group is optional
         (?:                             
            ([^<>]*)                     # all same as above
            (?:<br />)                
         )?
         (?:                             # outer non-capturing group
            (.*?)                        # non-greedy wildcard match
            (?:s+</div>)                # inner non-capturing group matching div
         )                               # final group is not optional

测试 :

>>> re.findall(rx, st, re.VERBOSE)
[( 192 kbps ,  2:41 ,  3.71 mb ), 
 (  ,   ,  3.49 mb ), 
 ( 128 kbps ,  3:31 ,  3.3 mb )]

请注意 re.VERBOSE 的旗帜,这是必要的,除非删除上面所有空白和注释。





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...