Question

我正在使用re.findall()从HTML文件中提取一些版本号：

>>> import re
>>> text = "<table><td><a href="url">Test0.2.1.zip</a></td><td>Test0.2.1</td></table> Test0.2.1"
>>> re.findall("Test([.0-9]*)", text)
[ 0.2.1. ,  0.2.1 ,  0.2.1 ]

but I would like to only get the ones that do not end in a dot. The filename might not always be .zip so I can t just stick .zip in the regex.

我想要最终得到：

[ 0.2.1 ,  0.2.1 ]

有人能建议使用更好的正则表达式吗？ :)

Answer 1

re.findall(r"Test([0-9.]*[0-9]+)", text)

或者，再短一点：

re.findall(r"Test([d.]*d+)", text)

顺便说一下-您不需要在字符类中转义句点。在[]中，。没有特殊含义，它只是匹配文字句点。转义它没有任何影响。

友情链接