English 中文(简体)
最有效的方式是在座标中找到次体?
原标题:most efficient way to go about identifying sub-strings in a string in python?

i 需寻找相当长的水泥(通用采购卡b)编码。

此时此刻,一米仅作假和体格。

the problem is, if the CPV code has been listed in a slightly different format, this algorithm won t find it.

如何最有效地在扼杀中寻找守则的所有不同之处? 这是否只是重新制定每部10 000只聚氯乙烯的编码,并在每种情况下采用经限定的编号?

不同格式的例子如下:

30124120-1 
301241201 
30124120 - 1
30124120 1
30124120.1

等等。

感谢:

最佳回答

定期表达:

>>> cpv = re.compile(r ([0-9]+[-. ]?[0-9]) )
>>> print cpv.findall( foo 30124120-1 bar 21966823.1 baz )
[ 30124120-1 ,  21966823.1 ]

(在数据中与聚氯乙烯相匹配之前进行更新)。)

问题回答

采用<代码>re的任何功能(定期表述<>)。 See the docs for more info.

您可以定期表达,接受这些守则的若干不同格式,然后使用<代码>re.findall<>或类似于提取信息的内容。 我不敢肯定一点一点,那就是,我没有定期表达这种意愿(尽管你可能看到谷歌有没有吗?)

cpv = re.compile(r (d{8})(?:[ -.	/\]*)(d{1}) )

for m in re.finditer(cpv, ex):
    cpval,chk = m.groups()
    print("{0}-{1}".format(cpval,chk))

适用于抽样数据回报

30124120-1
30124120-1
30124120-1
30124120-1
30124120-1

常规表述可改为

(d{8})         # eight digits

(?:             # followed by a sequence which does not get returned
  [ -.	/\]*   #   consisting of 0 or more
)               #   spaces, hyphens, periods, tabs, forward- or backslashes

(d{1})       # followed by one digit, ending at a word boundary
                #   (ie whitespace or the end of the string)

希望帮助!





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签