English 中文(简体)
用于解析字符串前缀的 Regex
原标题:Regex for parsing the prefix of a string
  • 时间:2012-05-25 05:01:28
  •  标签:
  • regex

我正在寻找一些文件名解析文件名的 Regex, 以便计算文件名前缀发生的实例数。 这里有一些样本字符串

gloves.tga
10jeans.jpg
shirt1.png
shirt2.png
coat_00.png
coat_12.gif
top1_01.png
top2_04.png

基本模式只是一连串字母或数字,然后是扩展。前缀是延长之前的所有文字(不包括延长期)

单件衣物可以散布于多个文件, 以衣着名称表示, 后加边划线, 后加一些索引编号, 后加扩展名。 前缀是所有内容, 但不包括边划。 其他的都可以被忽略 。

这涵盖了我正在处理的所有案件,但是我很难处理一个案件有一个强调,而另一个案件没有强调这一事实。

有人能帮我想个办法吗?

似乎有一个附加条件:衬衫1和衬衫2应作为相同的前缀处理。

所以如果字符串后面有一个数字, 然后紧接着一个扩展, 那么数字应该被忽略, 而如果数字后面有一个下划线, 那么它们就会被保留在前缀中 。

最佳回答

(Perl/PCRE语法)

/^([^._]+)/ 

这将捕捉字符串中最长的前缀, 它不包含时段或下划线 。

EDIT :OK,如果 shirk shirit1 中的前缀,那么您可以尝试这样的方式:

/^([^._]+)(?<!d)/

不允许以位数表示结尾的前缀。 在Ruby 1. 8 中这行不通, 因为 1.8 中没有类似的说法 。

EDIT 2: The above means that the prefix of top1_01 is top, but we want that one to include the digits before the underscore. So our last attempt is to add an alternative:

/^([^._]+)(?:(?<!d)|(?=_))/

The prefix has to either not end in a digit or be followed by an underscore. Demo:

%w<gloves.tga  10jeans.jpg shirt1.png  shirt2.png 
   coat_00.png coat_12.gif top1_01.png top2_04.png>.each do |filename|
  if m = filename.match(/^([^._]+)(?:(?<!d)|(?=_))/) then
    puts [ filename, m[1] ].join ":	"
  else
    warn "Uh-oh, couldn t find a prefix in filename  #{filename} ."
  end
end    

产出:

 gloves.tga:    gloves
 10jeans.jpg:   10jeans
 shirt1.png:    shirt
 shirt2.png:    shirt
 coat_00.png:   coat
 coat_12.gif:   coat
 top1_01.png:   top1
 top2_04.png:   top2
问题回答

暂无回答




相关问题
Uncommon regular expressions [closed]

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions.

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... />

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match ...

PowerShell -match operator and multiple groups

I have the following log entry that I am processing in PowerShell I m trying to extract all the activity names and durations using the -match operator but I am only getting one match group back. I m ...

Is it possible to negate a regular expression search?

I m building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This ...

regex for four-digit numbers (or "default")

I need a regex for four-digit numbers separated by comma ("default" can also be a value). Examples: 6755 3452,8767,9865,8766,3454 7678,9876 1234,9867,6876,9865 default Note: "default" ...

热门标签